diff --git a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco index 226b1d0..4e8e0da 100644 --- a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco +++ b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco @@ -33,7 +33,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima *. Only overfitters prosper (temporarily). *. Any model is production-ready if you can containerize it. *. If you’re logging metrics, you’re being audited. -*. If you’re seeing NaN, you need a smaller learning rate. +*. If you’re leaving GPUs unused, you need a bigger model. *. That which does not break your model has made a suboptimal adversarial example. *. When the loss plateaus, the wise call for more data. *. There is no “overkill.” There is only “more tokens” and “CUDA out of memory.”