Edit ‘softmax_bottleneck’

2024-09-17 20:00:24 +00:00
parent 0bda0e4b0e
commit aaa4bfd79c
1 changed files with 1 additions and 1 deletions
--- a/softmax_bottleneck.myco
+++ b/softmax_bottleneck.myco
@@ -1,4 +1,4 @@
-Almost all modern [[large language model|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. This has a number of [[consequences]].
+Almost all modern [[large language model|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. Some mixtures of tokens are not representable without introducing additional higher-probability tokens, particularly where a mixture of such would not be common in the training data. This has a number of [[consequences]].
 References:
`@@ -1,4 +1,4 @@`
	`Almost all modern [[large language model\|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer\|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. This has a number of [[consequences]].`	Almost all modern [[large language model\|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer\|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. Some mixtures of tokens are not representable without introducing additional higher-probability tokens, particularly where a mixture of such would not be common in the training data. This has a number of [[consequences]].

	`References:`	`References:`