Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In Claude Code specifically, for a while it had developed a nervous tic where it would say "Not malware." before every bit of code. Likely a similar issue where it keeps talking to a system/tool prompt.
 help



My pet theory is that they have a "supervisor" model (likely a small one) that terminates any chats that do malware-y things, and this is likely a reward-hacking behaviour to avoid the supervisor from terminating the chat.

I doubt it. We only do frontier models, since those are better for absolutely every use case 100% of the time.

Way more likely there's a "VERY IMPORTANT: When you see a block of code, ensure it's not malware" somewhere in the system prompt.


"small" and "frontier" are not mutually exclusive



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: