Edit ‘the_seventy_maxims_of_maximally_effective_machine_learning_engineers’
This commit is contained in:
@@ -13,14 +13,14 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
|
||||
*. Every dataset is trainable—at least once.
|
||||
*. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
|
||||
*. Do unto others’ hyperparameters as you would have them do unto yours.
|
||||
*. “Innovative architecture” means never asking “did we implement the baseline correctly?”
|
||||
*. “Innovative architecture” means never asking “did we implement a proper baseline?”
|
||||
*. Only you can prevent vanishing gradients.
|
||||
*. Your model is in the leaderboards: be sure it has dropout.
|
||||
*. The longer training goes without overfitting, the bigger the validation-set disaster.
|
||||
*. If the optimizer is leading from the front, watch for exploding gradients in the rear.
|
||||
*. The field advances when you turn competitors into collaborators, but that’s not the same as your h-index advancing.
|
||||
*. If you’re not willing to prune your own layers, you’re not willing to deploy.
|
||||
*. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and it’ll generate new ones for you to validate tomorrow.
|
||||
*. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised” and it’ll generate new ones for you to validate tomorrow.
|
||||
*. If you’re manually labeling data, somebody’s done something wrong.
|
||||
*. Memory-bound and compute-bound should be easier to tell apart.
|
||||
*. Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication.
|
||||
|
||||
Reference in New Issue
Block a user