Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When did we enter the twilight zone where bug trackers are consistently empty? The limiting factor of bug reduction is remediation, not discovery. Even developer smoke testing usually surfaces bugs at a rate far faster than they can be fixed let alone actual QA.

To be fair, the limiting factor in remediation is usually finding a reproducible test case which a vulnerability is by necessity. But, I would still bet most systems have plenty of bugs in their bug trackers which are accompanied by a reproducible test case which are still bottlenecked on remediation resources.

This is of course orthogonal to the fact that patching systems that are insecure by design into security has so far been a colossal failure.



Bugs are not the same as (real) high severity bugs.

If you find a bug in a web browser, that's no big deal. I've encountered bugs in web browsers all the time.

You figure out how to make a web page that when viewed deletes all the files on the user's hard drive? That's a little different and not something that people discover very often.

Sure, you'll still probably have a long queue of ReDoS bugs, but the only people who think those are security issues are people who enjoy the ego boost if having a cve in their name.


Eh, with browsers you can tell the user to go to hell if they don't like a secure but broken experience. The problem in most software is that you commit to bad ideas and then have to upset people who have higher status than the software dev that would tell them to go to hell.


That might have been true pre LLMs but you can literally point an agent at the queue until it’s empty now.


You literally cannot, since ANY changes to code tend to introduce unintended (or at least not explicitly requested) new behaviors.


Eventual convergence? Assuming each defect fix has a 30% chance of introducing a new defect, we keep cycling until done?


Assuming you can catch every new bug it introduces.

Both assumptions being unlikely.

You also end up with a code base you let an AI agent trample until it is satisfied; ballooned in complexity and redudant brittle code.


You can have an AI agent refactor and improve code quality.


But, have you any code that has been vetted and verified to see if this approach works? This whole Agentic code quality claim is an assertion, but where is the literal proof?


If it can be trained with reinforcement learning then it will happen


Did we have code quality before llms?


Funnily enough I've literally never seen anyone demo this, despite all the other AI hype. It's the one thing that convinces me they're still behind.


It’s agents all the way down - until you have liability. At some point, it’s going to be someone’s neck on the line, and saying “the agents know” isn’t going to satisfy customers (or in a worst case, courts).


> until you have liability

And are you thinking this going to start happening at some point or what?

The letters I get every other month telling me I now have free credit monitoring because of a personal info breach seems to suggest otherwise.


A firm has very different amounts of time, ability and money to spend on following up on broken contracts.


Sure it can. It's not like humans aren't already deflecting liability or moving it to insurance agencies.


> It's not like humans aren't already deflecting liability

They attempt to, sure, but it rarely works. Now, with AI, maybe it might, but that's sort of a worse outcome for the specific human involved - "If you're just an intermediary between the AI and me, WTF do I need you for?"

> or moving it to insurance agencies.

They aren't "moving" it to insurance companies, they are amortising the cost of the liability at a small extra cost.

That's a big difference.


At some point, the risk/return calculus becomes too expensive for insurance companies.

Usually thats after the premiums become too high for most people to pay.


Just today I had an agent add a fourth "special case" to a codebase, and I went back and DRY'd three of them.

Now I used the agent to do a lot of the grunt work in that refactor, but it was still a design decision initiated by me. The chatbot, left unattended, would not have seen that needed to be done. (And when, during my refactor, it tried to fold in the fourth case I had to stop it.)

(And for a lot of code, that's ok - my static site generator is an unholy mess at this point, and I don't much care. But for paid work...)


That's assuming that each fix can only introduce at most one additional defect, which is obviously untrue.


Why would it converge?


The chance of a defect fix introducing a new defect tends to grow linearly with the size of the codebase, since defects are usually caused by the interaction between code and there's now more code to interact with.

If you plot this out, you'll notice that it eventually reaches > 100% and the total number of defects will eventually grow exponentially, as each bugfix eventually introduces more bugs than it fixes. Which is what I've actually observed in 25 years in the software industry. The speed at which new bugs are introduced faster than bugfixes varies by organization and the skill of your software architects - good engineers know how to keep coupling down and limit the space of existing code that a new fix could possibly break. I've seen some startups where they reach this asymptote before bringing the product to market though (needless to say, they failed), and it's pretty common for computer games to become steaming piles of shit close to launch, and I've even seen some Google systems killed and rewritten because it became impossible to make forward progress on them. I call this technical bankruptcy, the end result of technical debt.


As long as we're inventing numbers, what if it's a 90% chance?

What if it's a 200% chance, and every fix introduces multiple defects?


Except they don't converge. You see that if you use agents to evolve a codebase. We also saw exactly that in the failed Anthropic experiment to create a C compiler.


I’ve had mine on a Ralph loop no problem. Just review the PR..


Which still means a single person with Claude can clear a queue in a day versus a month with a traditional team.


Your example must have incredible users or really trivial software.


The fact that KiCad still has a ton of highly upvoted missing features and the fact that FreeCAD still hasn't solved the topological renumbering problem are existence proofs to the contrary.


Shouldn't be down voted for saying this. There are active repo's this is happening in.

"BuT ThE LlM iS pRoBaBlY iNtRoDuCiNg MoRe BuGs ThAn It FiXeS"

This is an absurd take.


It probably is introducing more bugs because I think some people dont understand how bugs work.

Very, very rarely is a bug a mistake. As in, something unintentional that you just fix and boom, done.

No no. Most bugs are intentional, and the bug part is some unintended side effects that is a necessary, but unforseen, consequence of the main effect. So, you can't just "fix" the bug without changing behavior, changing your API, changing garauntees, whatever.

And that's how you get the 1 month 1-liner. Writing the one line is easy. But you have to spend a month debating if you should do it, and what will happen if you do.


So, you have already fixed all the bugs and now just cruising through life?


I wonder whether people like you have actually used Claude for any length of time.

I use it all day. I consider it a near-miracle. Yet I correct it multiple times daily.


> I wonder whether people like you have actually used Claude for any length of time.

I stated the LLMs are actively being used in repo's today, to chew through backlog items, and your response is to wonder if I've ever used Claude.

To me it's surprising that someone like you, who appears to have a reading comprehension deficiency, is able to use Claude.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: