When did we enter the twilight zone where bug trackers are consistently empty? T...

bawolff · 2026-03-31T06:15:49 1774937749

Bugs are not the same as (real) high severity bugs.

If you find a bug in a web browser, that's no big deal. I've encountered bugs in web browsers all the time.

You figure out how to make a web page that when viewed deletes all the files on the user's hard drive? That's a little different and not something that people discover very often.

Sure, you'll still probably have a long queue of ReDoS bugs, but the only people who think those are security issues are people who enjoy the ego boost if having a cve in their name.

kackerlacker · 2026-03-31T07:14:15 1774941255

Eh, with browsers you can tell the user to go to hell if they don't like a secure but broken experience. The problem in most software is that you commit to bad ideas and then have to upset people who have higher status than the software dev that would tell them to go to hell.

reactordev · 2026-03-30T21:18:53 1774905533

That might have been true pre LLMs but you can literally point an agent at the queue until it’s empty now.

batshit_beaver · 2026-03-30T21:27:04 1774906024

You literally cannot, since ANY changes to code tend to introduce unintended (or at least not explicitly requested) new behaviors.

lll-o-lll · 2026-03-30T21:32:49 1774906369

Eventual convergence? Assuming each defect fix has a 30% chance of introducing a new defect, we keep cycling until done?

saintfire · 2026-03-30T21:49:00 1774907340

Assuming you can catch every new bug it introduces.

Both assumptions being unlikely.

You also end up with a code base you let an AI agent trample until it is satisfied; ballooned in complexity and redudant brittle code.

charcircuit · 2026-03-30T22:23:02 1774909382

You can have an AI agent refactor and improve code quality.

abakker · 2026-03-31T00:22:17 1774916537

But, have you any code that has been vetted and verified to see if this approach works? This whole Agentic code quality claim is an assertion, but where is the literal proof?

WithinReason · 2026-03-31T07:34:05 1774942445

If it can be trained with reinforcement learning then it will happen

wredcoll · 2026-03-31T16:22:49 1774974169

Did we have code quality before llms?

lmm · 2026-03-31T08:01:42 1774944102

Funnily enough I've literally never seen anyone demo this, despite all the other AI hype. It's the one thing that convinces me they're still behind.

intended · 2026-03-31T05:20:12 1774934412

It’s agents all the way down - until you have liability. At some point, it’s going to be someone’s neck on the line, and saying “the agents know” isn’t going to satisfy customers (or in a worst case, courts).

wredcoll · 2026-03-31T16:24:58 1774974298

> until you have liability

And are you thinking this going to start happening at some point or what?

The letters I get every other month telling me I now have free credit monitoring because of a personal info breach seems to suggest otherwise.

intended · 2026-03-31T19:26:10 1774985170

A firm has very different amounts of time, ability and money to spend on following up on broken contracts.

charcircuit · 2026-03-31T07:45:38 1774943138

Sure it can. It's not like humans aren't already deflecting liability or moving it to insurance agencies.

lelanthran · 2026-03-31T11:13:25 1774955605

> It's not like humans aren't already deflecting liability

They attempt to, sure, but it rarely works. Now, with AI, maybe it might, but that's sort of a worse outcome for the specific human involved - "If you're just an intermediary between the AI and me, WTF do I need you for?"

> or moving it to insurance agencies.

They aren't "moving" it to insurance companies, they are amortising the cost of the liability at a small extra cost.

That's a big difference.

intended · 2026-03-31T12:14:07 1774959247

At some point, the risk/return calculus becomes too expensive for insurance companies.

Usually thats after the premiums become too high for most people to pay.

flir · 2026-03-31T14:27:15 1774967235

Just today I had an agent add a fourth "special case" to a codebase, and I went back and DRY'd three of them.

Now I used the agent to do a lot of the grunt work in that refactor, but it was still a design decision initiated by me. The chatbot, left unattended, would not have seen that needed to be done. (And when, during my refactor, it tried to fold in the fourth case I had to stop it.)

(And for a lot of code, that's ok - my static site generator is an unholy mess at this point, and I don't much care. But for paid work...)

majewsky · 2026-03-31T08:36:40 1774946200

That's assuming that each fix can only introduce at most one additional defect, which is obviously untrue.

Kinrany · 2026-03-30T21:42:58 1774906978

Why would it converge?

nostrademons · 2026-03-31T17:44:48 1774979088

The chance of a defect fix introducing a new defect tends to grow linearly with the size of the codebase, since defects are usually caused by the interaction between code and there's now more code to interact with.

If you plot this out, you'll notice that it eventually reaches > 100% and the total number of defects will eventually grow exponentially, as each bugfix eventually introduces more bugs than it fixes. Which is what I've actually observed in 25 years in the software industry. The speed at which new bugs are introduced faster than bugfixes varies by organization and the skill of your software architects - good engineers know how to keep coupling down and limit the space of existing code that a new fix could possibly break. I've seen some startups where they reach this asymptote before bringing the product to market though (needless to say, they failed), and it's pretty common for computer games to become steaming piles of shit close to launch, and I've even seen some Google systems killed and rewritten because it became impossible to make forward progress on them. I call this technical bankruptcy, the end result of technical debt.

bluefirebrand · 2026-03-31T12:27:53 1774960073

As long as we're inventing numbers, what if it's a 90% chance?

What if it's a 200% chance, and every fix introduces multiple defects?

pron · 2026-03-31T12:38:00 1774960680

Except they don't converge. You see that if you use agents to evolve a codebase. We also saw exactly that in the failed Anthropic experiment to create a C compiler.

reactordev · 2026-03-30T21:31:06 1774906266

I’ve had mine on a Ralph loop no problem. Just review the PR..

k_roy · 2026-03-30T21:31:26 1774906286

Which still means a single person with Claude can clear a queue in a day versus a month with a traditional team.

worthless-trash · 2026-03-31T05:41:00 1774935660

Your example must have incredible users or really trivial software.

bsder · 2026-03-30T22:26:55 1774909615

The fact that KiCad still has a ton of highly upvoted missing features and the fact that FreeCAD still hasn't solved the topological renumbering problem are existence proofs to the contrary.

rybosworld · 2026-03-30T23:58:27 1774915107

Shouldn't be down voted for saying this. There are active repo's this is happening in.

"BuT ThE LlM iS pRoBaBlY iNtRoDuCiNg MoRe BuGs ThAn It FiXeS"

This is an absurd take.

array_key_first · 2026-03-31T02:23:42 1774923822

It probably is introducing more bugs because I think some people dont understand how bugs work.

Very, very rarely is a bug a mistake. As in, something unintentional that you just fix and boom, done.

No no. Most bugs are intentional, and the bug part is some unintended side effects that is a necessary, but unforseen, consequence of the main effect. So, you can't just "fix" the bug without changing behavior, changing your API, changing garauntees, whatever.

And that's how you get the 1 month 1-liner. Writing the one line is easy. But you have to spend a month debating if you should do it, and what will happen if you do.

missingdays · 2026-03-31T10:44:57 1774953897

So, you have already fixed all the bugs and now just cruising through life?

jen729w · 2026-03-31T11:32:53 1774956773

I wonder whether people like you have actually used Claude for any length of time.

I use it all day. I consider it a near-miracle. Yet I correct it multiple times daily.

rybosworld · 2026-04-03T19:30:04 1775244604

> I wonder whether people like you have actually used Claude for any length of time.

I stated the LLMs are actively being used in repo's today, to chew through backlog items, and your response is to wonder if I've ever used Claude.

To me it's surprising that someone like you, who appears to have a reading comprehension deficiency, is able to use Claude.