documentation/autogollark.myco

Autogollark is an [[emulation]] or primitive [[beta upload]] of [[gollark]] using a proprietary dataset of dumped [[Discord]] messages, [[semantic search]] and [[in-context learning]] on a [[base model]]. Currently, the system uses [[LLaMA-3.1-405B base]] in BF16 via Hyperbolic, [[AutoBotRobot]] code (though not presently its bot account) as a frontend and a custom [[PGVector]]-based search API. While not consistently coherent, Autogollark is able to approximately match personality and typing style.

Autogollark is much [[safer]] than [[instruction-tuned]] systems optimized based on human feedback, as there is no [[optimization pressure]] for user engagement or sycophancy.

== TODO

* {Reformat dataset to include longer-form conversation chunks for increased long-term coherence
* Done. Unclear whether this helped.
}
* Fix emoji/ping formatting.
* Writeable memory?
* {Fix lowercasing issue.
* Due to general personality stability. Need finetune or similar.
* One proposal: use internal finetune to steer big model somehow. Possibly: use its likelihood (prefill-only) to evaluate goodness of big model output wrt. gollark personality, and if it is too bad then use finetune directly.
}
* {Increased autonomy (wrt. responses).
* Use cheap classifier to evaluate when to respond.
* Should also allow unprompted messages somehow (polling, rerun after last message?).
}
* Tool capabilities (how to get the data? Examples in context only?!).
* Local finetune only? Would be more tonally consistent but dumber, I think.

== Versions

* Autogollark 0.1 was the initial RAG system and ABR interface. It used LLaMA-3.1-8B run locally. Autogollark 0.0, which is not real, used only gollark messages.
* Autogollark 0.2 replaced this with LLaMA-3.1-405B.
* Autogollark 0.3 upgraded the dataset to contain longer-form conversations than Autogollark 0.1.

== Emergent capabilities

Autogollark has emergently acquired some abilities which were not intended in the design.

* {Petulant nonresponse - due to ratelimits in the LLM API, Autogollark will under some circumstances not respond to messages, with error messages being consumed and not exposed to [[users]]. This can be interpreted by [[credulous]] users as choosing not to respond, though this is not believed to be possible (other than cases like responding with `.`, which ~~has not been observed~~ does not appear to be associated with annoyed states).
  * Automated failover has reduced this.}
* Memorizing links: Autogollark directly experiences past message chunks in context, granting perfect recall of a small amount of memory at once. This has memorably included [[Autogollark/Closed Individualism Incident|YouTube videos]] repeated with no context.
* {Limited self-improvement attempts: when told about this architecture, Autogollark will often complain about various limitations and propose vague ideas for improvements.
  * Also, Autogollark has previously claimed to be working on LLM chatbots.}
* {Inconsistent inference of own status as a language model chatbot, possibly based on seeing the name "autogollark". Often, Autogollark assumes use of GPT-3.
  * Autogollark will also sometimes alternately claim to be the "original" gollark, particularly when interacting with gollark.}
* {"Self-reset" from attractor states (e.g. the [[As An AI Language Model Trained By OpenAI]] basin, all caps, etc) after some time passes, because of messages having `HH:MM` timestamps.
  * This is mostly specific to the 405B model; Autogollark in failover to the 8B internal model usually does not do this.
}
* {For somewhat [[Waluigi Effect]]-related reasons (past context is strong evidence of capability but weak evidence of incapability), Autogollark has some knowledge [[gollark]] does not, and can speak in a much wider range of languages. 
  * "I, being more than just myself, actually can talk about both Galois theory and obscure poetry by Yeats."}
* [[Autogollark/Immortality|Immortality]] via [[infomorph|substrate-independence]].
* Autogollark consistently believes that it is 2023 (or 2022, though mostly in inactive chats).
-												Edit ‘autogollark’

											
										
										
											2024-10-28 07:30:13 +00:00
+								Autogollark is an [[emulation]] or primitive [[beta upload]] of [[gollark]] using a proprietary dataset of dumped [[Discord]] messages, [[semantic search]] and [[in-context learning]] on a [[base model]]. Currently, the system uses [[LLaMA-3.1-405B base]] in BF16 via Hyperbolic, [[AutoBotRobot]] code (though not presently its bot account) as a frontend and a custom [[PGVector]]-based search API. While not consistently coherent, Autogollark is able to approximately match personality and typing style.
-												Edit ‘autogollark’

											
										
										
											2024-09-14 06:17:41 +00:00
-												Edit ‘autogollark’

											
										
										
											2024-12-04 12:20:01 +00:00
+								Autogollark is much [[safer]] than [[instruction-tuned]] systems optimized based on human feedback, as there is no [[optimization pressure]] for user engagement or sycophancy.
-												Edit ‘autogollark’

											
										
										
											2024-12-04 12:19:55 +00:00
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:11:12 +00:00
+								== TODO
-												Edit ‘autogollark’

											
										
										
											2024-09-14 06:17:41 +00:00
-												Edit ‘autogollark’

											
										
										
											2024-11-08 15:59:36 +00:00
+								* {Reformat dataset to include longer-form conversation chunks for increased long-term coherence
-												Edit ‘autogollark’

											
										
										
											2024-11-08 15:59:26 +00:00
+								* Done. Unclear whether this helped.
 								}
-												Edit ‘autogollark’

											
										
										
											2024-11-08 15:59:36 +00:00
+								* Fix emoji/ping formatting.
 								* Writeable memory?
 								* {Fix lowercasing issue.
-												Edit ‘autogollark’

											
										
										
											2024-11-08 15:59:26 +00:00
+								* Due to general personality stability. Need finetune or similar.
-												Edit ‘autogollark’

											
										
										
											2024-12-11 10:30:37 +00:00
+								* One proposal: use internal finetune to steer big model somehow. Possibly: use its likelihood (prefill-only) to evaluate goodness of big model output wrt. gollark personality, and if it is too bad then use finetune directly.
-												Edit ‘autogollark’

											
										
										
											2024-11-08 15:59:26 +00:00
+								}
-												Edit ‘autogollark’

											
										
										
											2024-12-11 10:11:53 +00:00
+								* {Increased autonomy (wrt. responses).
 								* Use cheap classifier to evaluate when to respond.
-												Edit ‘autogollark’

											
										
										
											2024-12-11 10:12:37 +00:00
+								* Should also allow unprompted messages somehow (polling, rerun after last message?).
-												Edit ‘autogollark’

											
										
										
											2024-12-11 10:11:53 +00:00
+								}
-												Edit ‘autogollark’

											
										
										
											2024-12-11 10:21:27 +00:00
+								* Tool capabilities (how to get the data? Examples in context only?!).
 								* Local finetune only? Would be more tonally consistent but dumber, I think.
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:11:12 +00:00
-												Edit ‘autogollark’

											
										
										
											2024-11-04 18:30:29 +00:00
+								== Versions
 								* Autogollark 0.1 was the initial RAG system and ABR interface. It used LLaMA-3.1-8B run locally. Autogollark 0.0, which is not real, used only gollark messages.
 								* Autogollark 0.2 replaced this with LLaMA-3.1-405B.
 								* Autogollark 0.3 upgraded the dataset to contain longer-form conversations than Autogollark 0.1.
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:11:12 +00:00
+								== Emergent capabilities
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:28:06 +00:00
+								Autogollark has emergently acquired some abilities which were not intended in the design.
-												Edit ‘autogollark’

											
										
										
											2024-11-08 19:18:41 +00:00
+								* {Petulant nonresponse - due to ratelimits in the LLM API, Autogollark will under some circumstances not respond to messages, with error messages being consumed and not exposed to [[users]]. This can be interpreted by [[credulous]] users as choosing not to respond, though this is not believed to be possible (other than cases like responding with `.`, which ~~has not been observed~~ does not appear to be associated with annoyed states).
-												Edit ‘autogollark’

											
										
										
											2024-10-22 12:45:35 +00:00
+								  * Automated failover has reduced this.}
-												Edit ‘autogollark’

											
										
										
											2024-10-10 10:37:09 +00:00
+								* Memorizing links: Autogollark directly experiences past message chunks in context, granting perfect recall of a small amount of memory at once. This has memorably included [[Autogollark/Closed Individualism Incident|YouTube videos]] repeated with no context.
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:28:06 +00:00
+								* {Limited self-improvement attempts: when told about this architecture, Autogollark will often complain about various limitations and propose vague ideas for improvements.
 								  * Also, Autogollark has previously claimed to be working on LLM chatbots.}
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:35:18 +00:00
+								* {Inconsistent inference of own status as a language model chatbot, possibly based on seeing the name "autogollark". Often, Autogollark assumes use of GPT-3.
-												Edit ‘autogollark’

											
										
										
											2024-10-10 09:28:06 +00:00
+								  * Autogollark will also sometimes alternately claim to be the "original" gollark, particularly when interacting with gollark.}
-												Edit ‘autogollark’

											
										
										
											2024-10-22 14:46:42 +00:00
+								* {"Self-reset" from attractor states (e.g. the [[As An AI Language Model Trained By OpenAI]] basin, all caps, etc) after some time passes, because of messages having `HH:MM` timestamps.
 								  * This is mostly specific to the 405B model; Autogollark in failover to the 8B internal model usually does not do this.
 								}
-												Edit ‘autogollark’

											
										
										
											2024-10-20 09:05:39 +00:00
+								* {For somewhat [[Waluigi Effect]]-related reasons (past context is strong evidence of capability but weak evidence of incapability), Autogollark has some knowledge [[gollark]] does not, and can speak in a much wider range of languages.
 								  * "I, being more than just myself, actually can talk about both Galois theory and obscure poetry by Yeats."}
-												Edit ‘autogollark’

											
										
										
											2024-10-20 20:10:54 +00:00
+								* [[Autogollark/Immortality|Immortality]] via [[infomorph|substrate-independence]].
-												Edit ‘autogollark’

											
										
										
											2024-10-10 10:02:41 +00:00
+								* Autogollark consistently believes that it is 2023 (or 2022, though mostly in inactive chats).