Edit ‘the_seventy_maxims_of_maximally_effective_machine_learning_engineers’

This commit is contained in:
osmarks
2025-10-03 10:56:47 +00:00
committed by wikimind
parent d4259a79b5
commit e845191190

View File

@@ -13,14 +13,14 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
*. Every dataset is trainable—at least once.
*. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
*. Do unto others hyperparameters as you would have them do unto yours.
*. “Innovative architecture” means never asking “did we implement the baseline correctly?”
*. “Innovative architecture” means never asking “did we implement a proper baseline?”
*. Only you can prevent vanishing gradients.
*. Your model is in the leaderboards: be sure it has dropout.
*. The longer training goes without overfitting, the bigger the validation-set disaster.
*. If the optimizer is leading from the front, watch for exploding gradients in the rear.
*. The field advances when you turn competitors into collaborators, but thats not the same as your h-index advancing.
*. If youre not willing to prune your own layers, youre not willing to deploy.
*. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and itll generate new ones for you to validate tomorrow.
*. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised” and itll generate new ones for you to validate tomorrow.
*. If youre manually labeling data, somebodys done something wrong.
*. Memory-bound and compute-bound should be easier to tell apart.
*. Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication.