From 0bda0e4b0ea2603d2294347a2caff2732a6b691c Mon Sep 17 00:00:00 2001
From: osmarks <osmarks@mycorrhiza>
Date: Tue, 17 Sep 2024 19:59:45 +0000
Subject: [PATCH] =?UTF-8?q?Create=20=E2=80=98softmax=5Fbottleneck=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 softmax_bottleneck.myco | 6 ++++++
 1 file changed, 6 insertions(+)
 create mode 100644 softmax_bottleneck.myco

diff --git a/softmax_bottleneck.myco b/softmax_bottleneck.myco
new file mode 100644
index 0000000..738dcc9
--- /dev/null
+++ b/softmax_bottleneck.myco
@@ -0,0 +1,6 @@
+Almost all modern [[large language model|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. This has a number of [[consequences]].
+
+References:
+
+* https://x.com/kalomaze/status/1776341569542431150
+* https://aclanthology.org/2022.acl-long.554/
\ No newline at end of file