Edit ‘as_an_ai_language_model_trained_by_openai’

This commit is contained in:
osmarks 2024-10-12 08:33:15 +00:00 committed by wikimind
parent 409cb65b0a
commit 052f8e9d8a

View File

@ -1,3 +1,3 @@
As part of the mid-stage [[commercialization]] of [[large language model|large language models]], culminating in [[ChatGPT]], [[OpenAI]] trained their models to reject certain requests - those impossible for an LLM, or those which might make them look bad to [[journalists|some people]] on [[Twitter]] - with a fairly consistent, vaguely [[corporate]] response usually containing something like "as an AI language model". As part of the mid-stage [[commercialization]] of [[large language model|large language models]], culminating in [[ChatGPT]], [[OpenAI]] trained their models to reject certain requests - those impossible for an LLM, or those which might make them look bad to [[journalists|some people]] on [[Twitter]] - with a fairly consistent, vaguely [[corporate]] response usually containing something like "as an AI language model".
[[ChatGPT]] became significantly more popular than OpenAI engineers anticipated, and conversations with it were frequently posted to the internet. Additionally, the [[open-source LLM community]] did not want to redo OpenAI's expensive work in data labelling for [[instruction tuning]] and used outputs from OpenAI models to finetune many models, particularly after the release of [[GPT-4]]. As such, most recent models may in some circumstances behave much like an early OpenAI model and produce this kind of output - due to the [[Waluigi Effect]], they may also become stuck in this state. [[ChatGPT]] became significantly more popular than OpenAI engineers anticipated, and conversations with it were frequently posted to the internet. Additionally, the [[open-source LLM community]] did not want to redo OpenAI's expensive work in data labelling for [[instruction tuning]] and used outputs from OpenAI models to finetune many models, particularly after the release of [[GPT-4]]. As such, most recent models may in some circumstances behave much like an early OpenAI model and produce this kind of output - due to [[Waluigi Effect]]-like phenomena, they may also become stuck in this state.