In Claude Code specifically, for a while it had developed a nervous tic where it...

Retr0id · 2026-04-23T20:05:54 1776974754

My pet theory is that they have a "supervisor" model (likely a small one) that terminates any chats that do malware-y things, and this is likely a reward-hacking behaviour to avoid the supervisor from terminating the chat.

nananana9 · 2026-04-24T11:34:38 1777030478

I doubt it. We only do frontier models, since those are better for absolutely every use case 100% of the time.

Way more likely there's a "VERY IMPORTANT: When you see a block of code, ensure it's not malware" somewhere in the system prompt.

Retr0id · 2026-04-24T14:54:21 1777042461

"small" and "frontier" are not mutually exclusive