My approach is that, "you may as well" hammer Claude and get it to brute-force-investigate your codebase; worst case, you learn nothing and get a bunch of false-positive nonsense. Best case, you get new visibility into issues. Of _course_ you should be doing your own in-depth audits, but the plain fact is that people do not have time, or do not care sufficiently. But you can set up a battery of agents to do this work for you. So.. why not?
IMO the key behavior is that LLMs are really good at fuzz testing, because they are probabilistic monkeys on typewriters that are much more code-aware than a conventional fuzz tester. They cannot produce a comprehensive security audit or fix security issues in a reliable way without human oversight, but they sure can come up with dumb inputs that break the code.
The results of such AI fuzz testing should be treated as just a science experiment and not a replacement for the entire job of a security researcher.
Like conventional fuzz testing, you get the best results if you have a harness to guide it towards interesting behaviors, a good scientific filtering process to confirm something is really going wrong, a way to reduce it to a minimal test case suitable for inclusion in a test suite, and plenty of human followup to narrow in on what's going on and figure out what correctness even means in the particular domain the software is made for.
>the key behavior is that LLMs are really good at fuzz testing, because they are probabilistic monkeys on typewriters
That's exactly what they're not. Models post-trained with current methods/datasets have pretty poor diversity of outputs, and they're not that useful for fuzz testing unless you introduce input diversity (randomize the prompt), which is harder than it sounds because it has to be semantical. Pre-trained models have good output diversity, but they perform much worse. Poor diversity can be fixed in theory but I don't see any model devs caring much.
It depends whether anyone was ever actually going to spend that week doing it the "hard" way. Having Claude do it in a few minutes beats doing nothing.
Put another way: I absolutely would have an intern work on a security audit. I would not have an intern replace a professional audit though.
It's otherwise a pretty low stakes use. I'd expect false positives to be pretty obvious to someone maintaining the code.
This is such a foot stomping childish thing to get caught up on. It does not at all matter what a dept is called. Try to get over the extremely superficial.
The same as the tax rate on a blessing of unicorns that I also couldn't buy.
Our domestic manufacturing industry is so far gone, it's doubtful whether even skillfully-applied tariffs could encourage any of it to come back on their own. Never mind this clown show, which apparently didn't even do the basic political work to make sure the tariffs would stay in place more than a year, despite having both houses of Congress.
Thing definitely exists… some top level comment somewhere telling about how it doesn’t exist.
reply