Edit ‘the_seventy_maxims_of_maximally_effective_machine_learning_engineers’
This commit is contained in:
@@ -14,7 +14,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
|
|||||||
*. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
|
*. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
|
||||||
*. Do unto others’ hyperparameters as you would have them do unto yours.
|
*. Do unto others’ hyperparameters as you would have them do unto yours.
|
||||||
*. “Innovative architecture” means never asking “did we implement a proper baseline?”
|
*. “Innovative architecture” means never asking “did we implement a proper baseline?”
|
||||||
*. Only you can prevent vanishing gradients.
|
*. Only you can prevent reward hacking.
|
||||||
*. Your model is in the leaderboards: be sure it has dropout.
|
*. Your model is in the leaderboards: be sure it has dropout.
|
||||||
*. The longer training goes without overfitting, the bigger the validation-set disaster.
|
*. The longer training goes without overfitting, the bigger the validation-set disaster.
|
||||||
*. If the optimizer is leading from the front, watch for exploding gradients in the rear.
|
*. If the optimizer is leading from the front, watch for exploding gradients in the rear.
|
||||||
|
|||||||
Reference in New Issue
Block a user