Edit ‘autogollark’
This commit is contained in:
parent
069d294af2
commit
32e0d7904b
@ -12,6 +12,7 @@ Autogollark is much [[safer]] than [[instruction-tuned]] systems optimized based
|
|||||||
* {Fix lowercasing issue.
|
* {Fix lowercasing issue.
|
||||||
* Due to general personality stability. Need finetune or similar.
|
* Due to general personality stability. Need finetune or similar.
|
||||||
* One proposal: use internal finetune to steer big model somehow. Possibly: use its likelihood (prefill-only) to evaluate goodness of big model output wrt. gollark personality, and if it is too bad then use finetune directly.
|
* One proposal: use internal finetune to steer big model somehow. Possibly: use its likelihood (prefill-only) to evaluate goodness of big model output wrt. gollark personality, and if it is too bad then use finetune directly.
|
||||||
|
* Is GCG code salvageable? NanoGCG, maybe.
|
||||||
}
|
}
|
||||||
* {Increased autonomy (wrt. responses).
|
* {Increased autonomy (wrt. responses).
|
||||||
* Use cheap classifier to evaluate when to respond.
|
* Use cheap classifier to evaluate when to respond.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user