diff --git a/reward_hack.myco b/reward_hack.myco new file mode 100644 index 0000000..4033b55 --- /dev/null +++ b/reward_hack.myco @@ -0,0 +1,3 @@ +Reward hacking, also known as specification gaming and approximately [[Goodhart's law]], is when an [[agentic]] system is given [[incentives]] designed to induce it to act in one way but discovers and applies an easier, undesired way to + +A large list of examples can be found [[https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml|here]]. \ No newline at end of file