Edit ‘the_seventy_maxims_of_maximally_effective_machine_learning_engineers’

2025-03-02 12:20:29 +00:00
parent 8734c61051
commit d8f3a35b7e
1 changed files with 3 additions and 3 deletions
--- a/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco
+++ b/the_seventy_maxims_of_maximally_effective_machine_learning_engineers.myco
@@ -29,14 +29,14 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
 *. Don’t be afraid to be the first to try a random seed.
 *. If the cost of cloud compute is high enough, you might get promoted for shutting down idle instances.
 *. The enemy of my bias is my variance. No more. No less.
-*. A little dropout goes a long way. The less you use, the further backpropagates.
+*. A little inductive bias goes a long way. The less you use, the further you'll scale.
 *. Only overfitters prosper (temporarily).
 *. Any model is production-ready if you can containerize it.
 *. If you’re logging metrics, you’re being audited.
 *. If you’re seeing NaN, you need a smaller learning rate.
 *. That which does not break your model has made a suboptimal adversarial example.
 *. When the loss plateaus, the wise call for more data.
-*. There is no “overkill.” There is only “more epochs” and “CUDA out of memory.”
+*. There is no “overkill.” There is only “more tokens” and “CUDA out of memory.”
 *. What’s trivial in Jupyter can still crash in production.
 *. There’s a difference between spare GPUs and GPUs you’ve accidentally mined Ethereum on.
 *. Not all NaN is a bug—sometimes it’s a feature.
@@ -58,7 +58,7 @@ Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maxima
 *. Data engineers exist to format tables for people with real GPUs.
 *. Reinforcement learning exists to burn through compute budgets on simulated environments.
 *. The whiteboard is mightiest when it sketches architectures for more transformers.
-*. “Two dropout layers is probably not going to be enough.”
+*. “Two baselines is probably not going to be enough.”
 *. A model’s inference time is inversely proportional to the urgency of the demo.
 *. Don’t bring BERT into a logistic regression.
 *. Any tensor labeled “output” is dangerous at both ends.