Based on [[https://schlockmercenary.fandom.com/wiki/The_Seventy_Maxims_of_Maximally_Effective_Mercenaries|The Seventy Maxims of Maximally Effective Mercenaries]]. This was suggested by Not Louis, and not Louis. Written by [[DeepSeek-R1]].
*. Feature importance and data leakage should be easier to tell apart.
*. If increasing model complexity wasn’t your last resort, you failed to add enough layers.
*. If the accuracy is high enough, stakeholders will stop complaining about the compute costs.
*. Harsh critiques have their place—usually in the rejected pull requests.
*. Never turn your back on a deployed model.
*. Sometimes the only way out is through… through another epoch.
*. Every dataset is trainable—at least once.
*. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
*. Do unto others’ hyperparameters as you would have them do unto yours.
*. “Innovative architecture” means never asking, “What’s the worst thing this could hallucinate?”
*. Only you can prevent vanishing gradients.
*. Your model is in the leaderboards: be sure it has dropout.
*. The longer training goes without overfitting, the bigger the validation-set disaster.
*. If the optimizer is leading from the front, watch for exploding gradients in the rear.
*. The field advances when you turn competitors into collaborators, but that’s not the same as your h-index advancing.
*. If you’re not willing to prune your own layers, you’re not willing to deploy.
*. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and it’ll generate new ones for you to validate tomorrow.
*. If you’re manually labeling data, somebody’s done something wrong.
*. Training loss and validation loss should be easier to tell apart.
*. Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication.
*. If your model’s failure is covered by the SLA, you didn’t test enough edge cases.