Edit ‘the_seventy_maxims_of_effective_machine_learning_engineers’

This commit is contained in:
osmarks 2025-03-01 22:23:10 +00:00 committed by wikimind
parent 59079062c9
commit ccf8156605

View File

@ -1,70 +1,70 @@
* Preprocess, then train. *. Preprocess, then train.
* A training loop in motion outranks a perfect architecture that isnt implemented. *. A training loop in motion outranks a perfect architecture that isnt implemented.
* A debugger with a stack trace outranks everyone else. *. A debugger with a stack trace outranks everyone else.
* Regularization covers a multitude of overfitting sins. *. Regularization covers a multitude of overfitting sins.
* Feature importance and data leakage should be easier to tell apart. *. Feature importance and data leakage should be easier to tell apart.
* If increasing model complexity wasnt your last resort, you failed to add enough layers. *. If increasing model complexity wasnt your last resort, you failed to add enough layers.
* If the accuracy is high enough, stakeholders will stop complaining about the compute costs. *. If the accuracy is high enough, stakeholders will stop complaining about the compute costs.
* Harsh critiques have their place—usually in the rejected pull requests. *. Harsh critiques have their place—usually in the rejected pull requests.
* Never turn your back on a deployed model. *. Never turn your back on a deployed model.
* Sometimes the only way out is through… through another epoch. *. Sometimes the only way out is through… through another epoch.
* Every dataset is trainable—at least once. *. Every dataset is trainable—at least once.
* A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up. *. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
* Do unto others hyperparameters as you would have them do unto yours. *. Do unto others hyperparameters as you would have them do unto yours.
* “Innovative architecture” means never asking, “Whats the worst thing this could hallucinate?” *. “Innovative architecture” means never asking, “Whats the worst thing this could hallucinate?”
* Only you can prevent vanishing gradients. *. Only you can prevent vanishing gradients.
* Your model is in the leaderboards: be sure it has dropout. *. Your model is in the leaderboards: be sure it has dropout.
* The longer training goes without overfitting, the bigger the validation-set disaster. *. The longer training goes without overfitting, the bigger the validation-set disaster.
* If the optimizer is leading from the front, watch for exploding gradients in the rear. *. If the optimizer is leading from the front, watch for exploding gradients in the rear.
* The field advances when you turn competitors into collaborators, but thats not the same as your h-index advancing. *. The field advances when you turn competitors into collaborators, but thats not the same as your h-index advancing.
* If youre not willing to prune your own layers, youre not willing to deploy. *. If youre not willing to prune your own layers, youre not willing to deploy.
* Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and itll generate new ones for you to validate tomorrow. *. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and itll generate new ones for you to validate tomorrow.
* If youre manually labeling data, somebodys done something wrong. *. If youre manually labeling data, somebodys done something wrong.
* Training loss and validation loss should be easier to tell apart. *. Training loss and validation loss should be easier to tell apart.
* Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication. *. Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication.
* If your models failure is covered by the SLA, you didnt test enough edge cases. *. If your models failure is covered by the SLA, you didnt test enough edge cases.
* “Fire-and-forget training” is fine, provided you never actually forget to monitor drift. *. “Fire-and-forget training” is fine, provided you never actually forget to monitor drift.
* Dont be afraid to be the first to try a random seed. *. Dont be afraid to be the first to try a random seed.
* If the cost of cloud compute is high enough, you might get promoted for shutting down idle instances. *. If the cost of cloud compute is high enough, you might get promoted for shutting down idle instances.
* The enemy of my bias is my variance. No more. No less. *. The enemy of my bias is my variance. No more. No less.
* A little dropout goes a long way. The less you use, the further backpropagates. *. A little dropout goes a long way. The less you use, the further backpropagates.
* Only overfitters prosper (temporarily). *. Only overfitters prosper (temporarily).
* Any model is production-ready if you can containerize it. *. Any model is production-ready if you can containerize it.
* If youre logging metrics, youre being audited. *. If youre logging metrics, youre being audited.
* If youre seeing NaN, you need a smaller learning rate. *. If youre seeing NaN, you need a smaller learning rate.
* That which does not break your model has made a suboptimal adversarial example. *. That which does not break your model has made a suboptimal adversarial example.
* When the loss plateaus, the wise call for more data. *. When the loss plateaus, the wise call for more data.
* There is no “overkill.” There is only “more epochs” and “CUDA out of memory.” *. There is no “overkill.” There is only “more epochs” and “CUDA out of memory.”
* Whats trivial in Jupyter can still crash in production. *. Whats trivial in Jupyter can still crash in production.
* Theres a difference between spare GPUs and GPUs youve accidentally mined Ethereum on. *. Theres a difference between spare GPUs and GPUs youve accidentally mined Ethereum on.
* Not all NaN is a bug—sometimes its a feature. *. Not all NaN is a bug—sometimes its a feature.
* “Do you have a checkpoint?” means “I cant fix this training run.” *. “Do you have a checkpoint?” means “I cant fix this training run.”
* “Theyll never expect this activation function” means “I want to try something non-differentiable.” *. “Theyll never expect this activation function” means “I want to try something non-differentiable.”
* If its a hack and it works, its still a hack and youre lucky. *. If its a hack and it works, its still a hack and youre lucky.
* If it can parallelize inference, it can double as a space heater. *. If it can parallelize inference, it can double as a space heater.
* The size of the grant is inversely proportional to the reproducibility of the results. *. The size of the grant is inversely proportional to the reproducibility of the results.
* Dont try to save money by undersampling. *. Dont try to save money by undersampling.
* Dont expect the data to cooperate in the creation of your dream benchmark. *. Dont expect the data to cooperate in the creation of your dream benchmark.
* If it aint overfit, it hasnt been trained on enough epochs. *. If it aint overfit, it hasnt been trained on enough epochs.
* Every client is one missed deadline away from switching to AutoML, and every AutoML is one custom loss function away from becoming a client. *. Every client is one missed deadline away from switching to AutoML, and every AutoML is one custom loss function away from becoming a client.
* If it only works on the training set, its defective. *. If it only works on the training set, its defective.
* Let them see you tune the hyperparameters before you abandon the project. *. Let them see you tune the hyperparameters before you abandon the project.
* The framework youve got is never the framework you want. *. The framework youve got is never the framework you want.
* The data youve got is never the data you want. *. The data youve got is never the data you want.
* Its only too many layers if you cant fit them in VRAM. *. Its only too many layers if you cant fit them in VRAM.
* Its only too many parameters if theyre multiplying NaNs. *. Its only too many parameters if theyre multiplying NaNs.
* Data engineers exist to format tables for people with real GPUs. *. Data engineers exist to format tables for people with real GPUs.
* Reinforcement learning exists to burn through compute budgets on simulated environments. *. Reinforcement learning exists to burn through compute budgets on simulated environments.
* The whiteboard is mightiest when it sketches architectures for more transformers. *. The whiteboard is mightiest when it sketches architectures for more transformers.
* “Two dropout layers is probably not going to be enough.” *. “Two dropout layers is probably not going to be enough.”
* A models inference time is inversely proportional to the urgency of the demo. *. A models inference time is inversely proportional to the urgency of the demo.
* Dont bring BERT into a logistic regression. *. Dont bring BERT into a logistic regression.
* Any tensor labeled “output” is dangerous at both ends. *. Any tensor labeled “output” is dangerous at both ends.
* The CTO knows how to do it by knowing who Googled it. *. The CTO knows how to do it by knowing who Googled it.
* An ounce of precision is worth a pound of recall. *. An ounce of precision is worth a pound of recall.
* After the merge, be the one with the main branch, not the one with the conflicts. *. After the merge, be the one with the main branch, not the one with the conflicts.
* Necessity is the mother of synthetic data. *. Necessity is the mother of synthetic data.
* If you cant explain it, cite the arXiv paper. *. If you cant explain it, cite the arXiv paper.
* Deploying with confidence intervals doesnt mean you shouldnt also deploy with a kill switch. *. Deploying with confidence intervals doesnt mean you shouldnt also deploy with a kill switch.
* Sometimes SOTA is a function of who had the biggest TPU pod. *. Sometimes SOTA is a function of who had the biggest TPU pod.
* Failure is not an option—it is mandatory. The option is whether to let failure be the last epoch or a learning rate adjustment. *. Failure is not an option—it is mandatory. The option is whether to let failure be the last epoch or a learning rate adjustment.