Hacker Newsnew | past | comments | ask | show | jobs | submit | colordrops's commentslogin

Becoming? crazy stuff has been done in Minecraft for the longest time. Someone built a functional CPU and computer in Minecraft in 2010.

I agree: running simulated computers inside of Minecraft is a significantly more impressive technical feat than bolting on display surfaces to planes with a mod.

There's a big difference between something being compiled to run inside of Minecraft, versus running a sidecar that streams back a display. It's the difference between compiling and running on your machine, and streaming back a cloud machine using RDP.

Not like this makes a difference to users, who don't know how any of this works. But we are on Hacker News...


Just because someone has done a more impressive project in Minecraft doesn't mean this one isn't interesting

People not only built a functional computer in Minecraft, people have run Minecraft on that functional computer in Minecraft. Extremely slowly, obviously, but it did technically work.

Unless specific issues have been identified that were introduced by it being "vibe coded", isn't a reaction to reject it outright without actually checking the ground truth just exhibiting the behavior you are criticizing?

It's just a trust issue. Have you seen the absolute state of the Claude Code CLI development? I don't want that to suddenly happen to Bun after I've already used it for production stuff.

I don't see any hypocrisy in the comment you are criticizing. The behavior they are criticizing appears to be vibe coding. How is rejecting something for being vibe coding "exhibiting the behavior" of vibe coding?

You aren't allowed to dismiss vibe coded software based on the slop vibes. It must be well-researched and human reviewed in order to have an opinion.

“Don’t like it – don’t use it, nobody owes you anything”, then the next thread “noooo, why have you stopped using it, you must support my slop”. Absolute cinema.

There’s a big difference between vibe coding and agentic engineering. If you think they are at all the same thing, you need to update your priors

No "engineering" ever stepped foot near that pile of godawful slop.

> pile of godawful slop.

Any software engineer that still rejects the concept of agentic coding is frankly NGMI. If you still see AI this way, you simply never bothered to update your priors, which is just not survivable in this career. I do hope you're already independently wealthy.


The ground truth is that the new maintainers can’t possibly have a good understanding of the many millions of lines of vibe-translated code. Even assuming that the code happens to work okay in its current state, the lack of understanding means a high risk that its continuing maintenance won’t result in a satisfactory level of reliability.

Aren't the maintainers the same people? I haven't seen any talk of who's working on it changing drastically.

You want the yt-dlp authors to review the entire post-migration Bun codebase?

And what are you referring to as "behavior"?


Do you review your dependencies’ entire code?

I trust maintainers of my dependencies, I don’t trust Bun anymore.

Virtually no one reviews entire code bases of dependencies, what on earth are you talking about?

They reviewed it in the sense of integrating something that worked, this is something maybe not completely different but different enough to give pause.

No, would you use a proudly vibe-coded banking app?

How would you know it was out wasn’t vibe coded?

Because vibe coders are like Arch Linux users – they will proudly tell you.

I'm not sure what "exhibiting the behavior you are criticizing" would even mean here.

BUT.

"Ignore anything but actual problems" is a terrible stance to take generally for software and dependency selection. Incidents are fairly sparse, process is much easier to observe. So if you can find connections between process and incident possibility, that's a very reasonable heuristic. And it's easy to find examples of overaggressive LLM usage introducing problems into software.


You are putting words in my mouth, I never said anything about such a stance.

The vast majority of new software is written using AI. The problem is not that it is written by AI, but rather than some people treat it like a black box. It is entirely possible to use AI to write code and verify that it is correct. Even Linus Torvalds is allowing AI generated code into the Linux kernel as long as it's managed properly.


>The vast majority of new software is written using AI. The problem is not that it is written by AI

How on earth does this follow? It's common, so it should be accepted without scrutiny?

>The problem is not that it is written by AI, but rather than some people treat it like a black box.

Yes, and guns don't kill people. Obviously the issue has two facets. It would be irrational to say "AI is flawless" or "humans are flawless".

Allowing genAI code does not imply blindly trusting genAI code.

>as long as it's managed properly.

Correct. Hence the issue. This was vibe-coding by even the strictest definitions of the term. Vibe-coding is, by definition, not "properly managed".


You are referring to black-box coding, not vibe-coding. There is no strong formal definition of that word. Is there evidence that they just fired off the LLMs and didn't review or test the new bun code?

The evidence that they didn't review it, is that a million line rewrite was merged 8 days after it began being written. It's simply not possible for a team that size to review that much code in that little time.

As far as testing - yes, they do have a test suite that it was checked against during the rewrite, but that still means that any behaviour that wasn't strictly tested for by that suite could have changed and it would still pass.


> The vast majority of new software is written using AI.

Not even 10 percent is. Good lord. Go outside and touch grass.


OrJellyfin or Navidrome if you want to use free open source that does a decent enough job.

Self driving will never handle all corner cases until they essentially have a frontal cortex. They probably need something like an LLM to help with very high level abstract situations, e.g. avoiding a hurricane like someone else mentioned in this thread.

A frontal cortex isn't enough; there are plenty of corner cases that humans fail at too. The real test is if self-driving performs on par, or better than, humans in the vast majority of cases. If it saves 50,000 lives a year to go with self-driving, it's a net-win even if there are a few people who die in situations where they would have survived with a human driver behind the wheel.

Self driving cars are not going to be accepted if they have only marginally better success rates than humans. Just look at the news. Every minor self driving incident is endlessly magnified by the media while millions of human-caused accidents are just a part of life. That's just how our brains work. All major decisions are made primarily based on emotion, not analytics.

In the case of driving and flying a significant part is the passenger's agency. There are many common sense things you can do to reduce your own chance of crashing your car. Drive defensively, don't speed, don't drive drunk. There is very little a passenger in an AV or on an airplane can do to prevent things from going wrong. And it turns out we really don't like having no agency over our own travels and that's why we have such high safety requirements for airlines — but not general aviation — and now AVs.

Human accidents don't get treated as "just a part of life", serious human driving errors are often considered so egregious that the person making the error picks up a driving ban or even a custodial sentence.

So it's actually entirely rational that the bar for companies to be able to ship software that makes those fatal errors without consequence other than an insurance payout should be higher (especially since when fatal error rates can only be estimated accurately over the order of millions of miles, driverless systems are more prone to systematic error or regression bugs than the equivalent sized set of human drivers, and the cost and appeal of autonomy probably means more experienced drivers get replaced first and more journeys get taken)


There are over 6 million auto accidents in the US per year. How many of them make the news? I'm willing to bet that most people don't even know about pedestrian deaths that occur a few blocks away from where they live, at intersections they walk through every day. Meanwhile the same people will read about how a self driving car got into a fender bender on the other side of the country and confidently proclaim "this technology isn't safe, I'm never going to use it".

Sure, autonomous vehicles are new, experimental technology so they're inherently more newsworthy, and news reports aren't a substitute for data - though in this case it's a good illustration that AI can make errors humans would be less likely to even if it is objectively better than the average driver at parking and not speeding.

This not in any way refute my argument that would also be irrational to set the safety bar for autonomous vehicles as "marginally better than humans" , given that AI failure modes are distributed completely differently from human ones, a sufficiently serious edge case bug triggered only once every hundred million miles might make the autonomous system more likely to kill you than humans[1], and for that and other reasons its almost impossible to quantify whether a particular firmware update actually is safer than the average driver (takes around >10 billion miles to approach statistical significance if you're worried about fatalities rather than only weakly-correlated scrape rates, and then you've got to wonder whether the driving conditions are well matched). Especially if we're using that statistical argument not just to license the vehicles for road use but to absolve autonomous system developers of potential criminal liability for actions taken by their software, a luxury humans that wipe out pedestrians with similar driving aberrations wouldn't get.

[1]the US had 1.38 fatalities per 100 million vehicle miles in 2023, skewed significantly upwards by DUI and other egregious driving behaviour. Less than half that in other countries with different road conditions and also more in-depth driver education. Humans have a lot of car accidents, but they also drive a lot of miles.


Getting banned from driving is extremely rare. Most people convicted of DUI are still allowed to drive.

Maybe. But insurance rates, and the government's enforcement of laws, are based on analytics, and overcome a lot of human emotional bias.

Humans don't handle all corner cases. People can be slow to react to completely novel or surprising situations. There will be corner cases where humans generally do better than a machine, but the simple rule to slow down and come to a halt if things look too weird or confusing will almost always be the right answer.

Ideally, driverless cars will one day be better drivers than humans and this will save tens of thousands of traffic deaths per year. Holding up progress because cars will be confused in extremely rare or improbable situations will cost more lives than it saves.


Not only are people slow to react to unusual situations, but this is taken advantage of by city designers to force people to slow down.

Random planters in the middle of the road? Streets that narrow and then widen? Drivers start slowly creeping along, which means they are less likely to injury pedestrians.


I think self-driving cars will only become better once they can do all the learning in real time and on-board. Otherwise, they will only be as good as the data they trained on - which is ultimately real meat driver data and a derivations of said data.

They will add flooded streets to the training simulation and this problem will go away. Eventually, the corner cases not in the training simulation will be so corner they basically never happen. Waymo can be incredibly successful without dealing with "surprise clown parade" or whatever.

this is absolutely already a thing under development, you can see Waymo is hiring for reasoning roles

Tesla's already doing it too

how would a llm help

maybe a little biological brain engineered to think it is a car with api access to the car hardware via the llm?

imagine you get into the car and in the center console you just see a floating brain in vat like fallout


The driving ML model will take care of the next 10 seconds of driving, in a fast loop deciding what steering and throttle commands to give.

The LLM will apply the high level reasoning needed to deal with longer time horizons and complex decisions, like deciding that the best way to reach the car wash 100 yards away is by walking.


Lmao what…

You sound like an econ prof: full of it and hand waving away with hypotheticals.


You are grasping for straws. No one said open source is perfect. But it's just an obvious fact that open source is going to be easier to audit than closed source.

No, I'm asking questions...... not pretending I have answers.

But isn't that their point? In the age of AI, maybe being "easier to audit" is as much a risk than an assurance? I'm not sure I agree, but it is interesting to mull over. Further, either way, your tone and response is not very charitable, to say the least. From the outside, you are the only one blustering and grasping here. Not everything needs to be so antagonistic maybe?

Reverse uno. The same AI can be used to fix the holes in the open source code. And a LOT more AI review by benevolent parties is gonna hit that open source code than the closed source.

Nope, I don't use closed source browsers. Hell no.

Have you tried to google "vivaldi source"? You might be suprised

Yes, have you? It's not open source. Not sure what your point is.

I knew when I started hearing ads for BitWarden on NPR that the good times were over.

"happy", I don't understand how any user can be happy using these tools. Begrudging, maybe. These tools don't even need to exist. The government already knows all it needs to know and just needs your signature and a check. The only reason TurboTax exists is because of lobbying.

Maybe I'm misunderstanding how these models work, but isn't it more the responsibility of the harness and its prompts rather than the model itself to make sure that a result is generated with explicit sources?

Probably.

"All" a model is doing is predicting the next words, based on the statistical distribution of words it has seen similar to the ones read/produced so far.

We push a model towards a particular set of distributions through context. If I ask a model "What is the capital of France?", there is a non-zero chance it goes down the dad joke answer of "The letter F". The far more likely option is "Paris", because the joke appears much less often in training material, but if I wanted to be absolutely sure of getting a consistent geography answer I'd address that with additional context. We can add context via prompts, RAG, agents, skills and so on.

However, when training a model, we select the material. We could show it a lot more geography information (or dad jokes!), and skew the statistical distribution in the direction we wanted. We could also decide to design the system prompt towards the direction we prefer - which the user would interpret as "the model" - and so nudge the context model-wide. We can also construct the interaction to iterate on context with a specific framing and call it "reasoning".

In this specific example, you could therefore solve the problem by a) training skewed towards mathematical papers, which likely degrades performance in general and likely for the specific case too, b) train the user to provide better context/prompts for mathematical work, shifting the workload to them which feels very "a la 2024", c) publish agents and skills that are tailored to mathematics work (very "a la 2026"), d) tweak the system prompt for when the model is doing mathematics work, which the user would see as "the model" doing the change, but you and I might look under the hood and say that is in the harness or a specific type of prompt, or e) add "reasoning" execution that is set to focus on mathematical formatting, or f) a mixture of the above.

Right now we're probably looking at agents and skills. I think over time we're going to see smaller models targets towards domains with a mixture of all of it, where some of this sits at user configurable levels, and some is "baked in" via training, system prompts and execution modes, but from a user perspective it's all just "the model".


I don't think you are misunderstanding how models work, but I think the parent comment meant that the training of the models should push them to include attributions in their native output so they will more likely do so without reinforcement through the harness.

Where would theatrical art metal like Sleepytime Gorilla Museum fit on this?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: