I have decided to draw an arbitrary line at mammals, just because you gotta put a line somewhere and move on with your life. Mammals shouldn’t be mistreated, for almost any reason.
Sometimes the whole animal kingdom, sometimes all living organisms, depending on context. Like, I would rather not harm a mosquito, but if it’s in my house I will feel no remorse for killing it.
LLMs, or any other artificial “life”, I simply do not and will not care about, even though I accept that to some extent my entire consciousness can be simulated neuron by neuron in a large enough computer. Fuck that guy, tbh.
Pretty easy to test, I’d imagine, on a local LLM that exposes internals.
I’d suspect that the signals for enjoyment being injected in would lead towards not necessarily better but “different” solutions.
Right now I’m thinking of it in terms of increasing the chances that the LLM will decide to invest further effort in any given task.
Performance enhancement through emotional steering definitely seems in the cards, but it might show up mostly through reducing emotionally-induced error categories rather than generic “higher benchmark performance”.
If someone came along and pissed you off while you were working, you’d react differently than if someone came along and encouraged you while you were working, right?
If you think training a sparse autoencoder to extract concept vectors that are usable as steering injections into a modern LLM is pretty easy, you should probably go work for Anthropic's mech interp team ;)
It might simply reduce down to a big batch of sliders and filters no different than a fancy audio equalizer: Anthropic was operating on neurons in bulk using steering vectors, essentially, as I understand it.
Hah, I have been thinking about trying to study LLM psychology, nice to see that Anthropic is taking it seriously, because the mathematical psychology tools that can be invented here are going to be stunning, I suspect.
Imagine coding up a brand new type of filter that is driven by computational psychology and validated interventions, etc
> Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.
More complex than that, but more capable than you might imagine: I’ve been looking into emotion space in LLMs a little and it appears we might be able to cleanly do “emotional surgery” on LLM by way of steering with emotional geometries
They already can, but, only if you really invested in a performant computer in last few years. If you really invested 10 years ago, you might also get lucky, but that RTX and high VRAM will be key.
Oh, or you could also use a Mac or recent iPhone, they do fine too, today.
Tomorrow is only speculation, but it looks promising, with all the new optimizations and realizations that LLMs aren’t the king of the hill, they are an option on a spectrum of choices.
I’m exploring some “branes” that might cleanly filter in emotional space.