From aaa4bfd79cc61c358a754616e344ad6089cc0611 Mon Sep 17 00:00:00 2001 From: osmarks Date: Tue, 17 Sep 2024 20:00:24 +0000 Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98softmax=5Fbottleneck=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- softmax_bottleneck.myco | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/softmax_bottleneck.myco b/softmax_bottleneck.myco index 738dcc9..202dc26 100644 --- a/softmax_bottleneck.myco +++ b/softmax_bottleneck.myco @@ -1,4 +1,4 @@ -Almost all modern [[large language model|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. This has a number of [[consequences]]. +Almost all modern [[large language model|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. Some mixtures of tokens are not representable without introducing additional higher-probability tokens, particularly where a mixture of such would not be common in the training data. This has a number of [[consequences]]. References: