From 66814844ca59b47829a8b18c168c2ddf1b02ef50 Mon Sep 17 00:00:00 2001 From: osmarks Date: Sat, 22 Mar 2025 11:01:04 +0000 Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98the=5Fseventy=5Fmaxims=5Fof=5Fm?= =?UTF-8?q?aximally=5Feffective=5Fmachine=5Flearning=5Fengineers=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...axims_of_maximally_effective_machine_learning_engineers.myco | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco index 226b1d0..4e8e0da 100644 --- a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco +++ b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco @@ -33,7 +33,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima *. Only overfitters prosper (temporarily). *. Any model is production-ready if you can containerize it. *. If you’re logging metrics, you’re being audited. -*. If you’re seeing NaN, you need a smaller learning rate. +*. If you’re leaving GPUs unused, you need a bigger model. *. That which does not break your model has made a suboptimal adversarial example. *. When the loss plateaus, the wise call for more data. *. There is no “overkill.” There is only “more tokens” and “CUDA out of memory.”