diff --git a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco index 807a5b3..0ae4774 100644 --- a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco +++ b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco @@ -14,7 +14,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima *. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up. *. Do unto others’ hyperparameters as you would have them do unto yours. *. “Innovative architecture” means never asking “did we implement a proper baseline?” -*. Only you can prevent vanishing gradients. +*. Only you can prevent reward hacking. *. Your model is in the leaderboards: be sure it has dropout. *. The longer training goes without overfitting, the bigger the validation-set disaster. *. If the optimizer is leading from the front, watch for exploding gradients in the rear.