diff --git a/softmax_bottleneck.myco b/softmax_bottleneck.myco new file mode 100644 index 0000000..738dcc9 --- /dev/null +++ b/softmax_bottleneck.myco @@ -0,0 +1,6 @@ +Almost all modern [[large language model|LLMs]] map relatively low-dimensional hidden states to high-dimensional probability distributions over [[tokenizer|tokens]] using a single [[matrix]] and a [[softmax]] operation. The [[rank]] of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. This has a number of [[consequences]]. + +References: + +* https://x.com/kalomaze/status/1776341569542431150 +* https://aclanthology.org/2022.acl-long.554/ \ No newline at end of file