Edit ‘the_seventy_maxims_of_maximally_effective_machine_learning_engineers’
This commit is contained in:
parent
8734c61051
commit
d8f3a35b7e
@ -29,14 +29,14 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
|
||||
*. Don’t be afraid to be the first to try a random seed.
|
||||
*. If the cost of cloud compute is high enough, you might get promoted for shutting down idle instances.
|
||||
*. The enemy of my bias is my variance. No more. No less.
|
||||
*. A little dropout goes a long way. The less you use, the further backpropagates.
|
||||
*. A little inductive bias goes a long way. The less you use, the further you'll scale.
|
||||
*. Only overfitters prosper (temporarily).
|
||||
*. Any model is production-ready if you can containerize it.
|
||||
*. If you’re logging metrics, you’re being audited.
|
||||
*. If you’re seeing NaN, you need a smaller learning rate.
|
||||
*. That which does not break your model has made a suboptimal adversarial example.
|
||||
*. When the loss plateaus, the wise call for more data.
|
||||
*. There is no “overkill.” There is only “more epochs” and “CUDA out of memory.”
|
||||
*. There is no “overkill.” There is only “more tokens” and “CUDA out of memory.”
|
||||
*. What’s trivial in Jupyter can still crash in production.
|
||||
*. There’s a difference between spare GPUs and GPUs you’ve accidentally mined Ethereum on.
|
||||
*. Not all NaN is a bug—sometimes it’s a feature.
|
||||
@ -58,7 +58,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
|
||||
*. Data engineers exist to format tables for people with real GPUs.
|
||||
*. Reinforcement learning exists to burn through compute budgets on simulated environments.
|
||||
*. The whiteboard is mightiest when it sketches architectures for more transformers.
|
||||
*. “Two dropout layers is probably not going to be enough.”
|
||||
*. “Two baselines is probably not going to be enough.”
|
||||
*. A model’s inference time is inversely proportional to the urgency of the demo.
|
||||
*. Don’t bring BERT into a logistic regression.
|
||||
*. Any tensor labeled “output” is dangerous at both ends.
|
||||
|
Loading…
x
Reference in New Issue
Block a user