From 6be8909f4a342999376dcf6dffe9bb433ee9d197 Mon Sep 17 00:00:00 2001
From: osmarks <osmarks@mycorrhiza>
Date: Thu, 5 Sep 2024 19:40:05 +0000
Subject: [PATCH] =?UTF-8?q?Create=20=E2=80=98large=5Flanguage=5Fmodel?=
 =?UTF-8?q?=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 large_language_model.myco | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100644 large_language_model.myco

diff --git a/large_language_model.myco b/large_language_model.myco
new file mode 100644
index 0000000..59f564a
--- /dev/null
+++ b/large_language_model.myco
@@ -0,0 +1,17 @@
+A large language model is a [[neural net]] model of [[language]] which is [[large]], usually in the sense of parameter count or total [[compute]], making them [[good]] at text prediction by [[scaling laws]]. Usually these are [[autoregressive]] and pretrained on general text data with a next token prediction [[loss function]], though this is not necessarily required. The largest large LLMs known are around 2 trillion parameters, though the smallest LLM is not known.
+
+== History
+
+The era of LLMs is often taken to have begun with [[https://arxiv.org/abs/1706.03762|Attention Is All You Need]], published in 2017, though it draws heavily on previous work in [[machine translation]]. This paper introduced the [[Transformer]] architecture used in most modern LLMs, though with only ~1e8 parameters and supervised training data its models are not central examples.
+
+The modern decoder-only architecture and [[self-supervised]] pretraining used now derives from [[https://openai.com/index/language-unsupervised/|OpenAI's GPT-1]], which demonstrated that then-large amounts of compute (240 GPU-days with unspecified GPUs) could be very effective across tasks with unspecialized training data and architecture. [[https://openai.com/index/better-language-models/|GPT-2]], which changed little but scaled further, was widely regarded as a [[cool toy]] by those aware of it, though those within [[OpenAI]] apparently understood the promise of scaling, as it was followed up by [[https://arxiv.org/abs/2005.14165|GPT-3]], which added several orders of magnitude more compute, resulting in enough capability to be both [[useful]] and [[fearsome]].
+
+Development slowed at this point due to design and compute limits. In the [[https://arxiv.org/abs/2203.15556|Chinchilla paper]], [[Google DeepMind]] showed that existing [[scaling laws]] were [[wrong]], and models could be trained substantially more efficiently by reducing parameter count and using additional training data, and [[https://arxiv.org/abs/2308.00951|Mixture of Experts architectures]] granted further compute efficiency. It was not until March 2023 - by which time several organizations had produced GPT-3-level models - that OpenAI announced [[GPT-4]] (though it was finished in August 2022 but kept secret for 8 months for "safety testing"). It represented a very significant advance, and combined with the [[instruction tuning]] which made [[ChatGPT]] a user-friendly product, crushed all competitors. Following this, however, was the [[GPT-4 Wall]].
+
+== Applications
+
+* [[Autogollark]]
+* [[Code generation]]
+* [[Email jobs]]
+* [[Paperclip manufacturing]]
+* [[ERP]]
\ No newline at end of file