From d4259a79b5092f128de4369a75ca8609d8f45d64 Mon Sep 17 00:00:00 2001 From: osmarks Date: Fri, 3 Oct 2025 10:56:27 +0000 Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98the=5Fseventy=5Fmaxims=5Fof=5Fm?= =?UTF-8?q?aximally=5Feffective=5Fmachine=5Flearning=5Fengineers=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...axims_of_maximally_effective_machine_learning_engineers.myco | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco index 9a49f19..5b1b800 100644 --- a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco +++ b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco @@ -13,7 +13,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima *. Every dataset is trainable—at least once. *. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up. *. Do unto others’ hyperparameters as you would have them do unto yours. -*. “Innovative architecture” means never asking, “What’s the worst thing this could hallucinate?” +*. “Innovative architecture” means never asking “did we implement the baseline correctly?” *. Only you can prevent vanishing gradients. *. Your model is in the leaderboards: be sure it has dropout. *. The longer training goes without overfitting, the bigger the validation-set disaster.