documentation/reward_hack.myco
2024-11-25 16:30:14 +00:00

3 lines
424 B
Plaintext

Reward hacking, also known as specification gaming and approximately [[Goodhart's law]], is when an [[agentic]] system is given [[incentives]] designed to induce it to act in one way but discovers and applies an easier, undesired way to
A large list of examples can be found [[https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml|here]].