“Hey Claude, you have a bunch of skills defined, some mcps, and memory filled with useful stuff. I want to use you on a machine accessible over SSH at <host>, can you clone yourself over?”
Why are people thanking Google? That’s like another slap on the face of Epic who burned through their millions to put a (soft) end to Google and Apple’s dominance. They still get to keep a significant cut.
Who cares, this is how the American justice system works. AFAIK - only DAs and govt (sometimes) look out for people’s interests, and the occasional class action lawsuits. Many civil liberties cases were one black man or black woman, or a woman, fighting against some company/establishment/govt and that ended up benefiting all others (equal rights, right to vote, right to abortion, etc).
In this case, all other businesses get the same terms as Epic. In my eyes that’s a win, better than the system that existed before.
Yep. Spot on. And the reason you know this is true is because the arguments about increasing prices for customers due to App Store fees, which is one of the primary arguments, once removed does not result in price reductions for customers.
It's just big billion dollar corporations deciding on who keeps what cut.
I'm hardly a fan of Epic, but considering inflation and rising supply chain costs, a price that remains flat may be a price that would have otherwise risen.
They might also direct the money towards funding more exclusives. Epic's funding has enabled some games to be made that wouldn't have been otherwise, or that wouldn't have been as full featured without that up-front cash.
They sell gambling to children via lootboxes; I'm not saying they're the good guy corp. But removing Apple and Google's monopoly over phone apps and app stores would only be a good thing, in my opinion.
Sure but it's not just Epic. I've seen other services, ranging from Netflix to Spotify increase subscription prices.
I don't disagree with your point about inflation, but we also can't really run the counterfactual, and I'm personally not inclined to give the benefit of the doubt here. As an aside we generally have some level of inflation and so while this argument may have been more convincing during a period of rapid inflation, it becomes less convincing over time.
I think the reality is these services have massive margins and so there was never any intent on the part of Epic at least, to lower prices. It was always to just capture more value for their company. I don't blame them for doing that, I just find the "we're the good guys" approach to be suspicious at best.
Apple's monopoly (because I have an iPhone) has been of incredible value to me so I prefer that the monopoly continue to exist. As we remove that monopoly I see more consumer harm done than good.
> considering inflation and rising supply chain costs
I just can't for the life of me figure out where this money goes. People bought the same type of things 10 years ago, and the cost now isn't proportional to the cost 10 years ago.
Who cares? Their lawsuit made it way better for mobile devs in the US, including me, for selling apps. Epic can do whatever they want for their own stores as far as I care.
I wouldn't die on this hill. Epic is about as un-sympathetic character in the videogame space as you'll find anywhere. Epic wasn't trying to be altruistic.
That sounds awfully similar to what Opus 4.6 does on my tasks sometimes.
> Blah blah blah (second guesses its own reasoning half a dozen times then goes). Actually, it would be a simpler to just ...
Specifically on Antigravity, I've noticed it doing that trying to "save time" to stay within some artificial deadline.
It might have something to do with the system messages and the reinforcement/realignment messages that are interwoven into the context (but never displayed to end-users) to keep the agents on task.
As someone that started using Co-work, I feel like I am going insane with the frequency that I have to keep telling it to stay on task.
If you ask it to do something laborious like review a bunch of websites for specific content it will constantly give up, providing you information on how you can continue the process yourself to save time. Its maddening.
That’s pretty funny when compared with the rhetoric like “AI doesn’t get tired like humans.” No, it doesn’t, but it roleplays like it does. I guess there is too much reference to human concerns like fatigue and saving effort in the training.
This is what happens when a bunch of billionaires convince people autocomplete is AI.
Don't get me wrong, it's very good autocomplete and if you run it in a loop with good tooling around it, you can get interesting, even useful results. But by its nature it is still autocomplete and it always just predicts text. Specifically, text which is usually about humans and/or by humans.
You are not wrong, but after having started working with LLMs, I have this feeling that many humans are simply autocomplete engines too. So LLMs might be actually close to AGI, if you define "general" as "more than 50% of the population".
Humans are absolutely auto-complete engines, and regularly produce incorrect statements and actions with full confidence in it being precisely correct.
Just think about how many thousands of times you've heard "good morning" after noon both with and without the subsequent "or I guess I should say good afternoon" auto-correct.
Well the essence of software engineering is taking this complex real world tasks and breaking them down into simpler parts until they can be done by simple (conceptually) digital circuits.
So it's not surprising that eventually autocomplete can reach up from those circuits and take on some tasks that have already been made simple enough.
I think what's so interesting is how uneven that reach is. Some tasks it is better than at least 90% of devs and maybe even superhuman (which, in this case, I mean better than any single human. I've never seen an LLM do something that a small team couldn't do better if given a reasonable amount of time). Other cases actual old school autocomplete might do a better job, the extra capabilities added up to negative value and its presence was a distraction.
Sometimes there is an obvious reason why (solving a problem with lots of example solution online vs working with poorly documented proprietary technologies), but other times there isn't. They certainly have raised the floor somewhat, but the peaks and valleys remain enormous which is interesting.
To me that implies there is both lots of untapped potential and challenges the LLM developers have not even begun to face.
Yep. The veil of coherence extends convincingly far by means of absurd statistical power, but the artifacts of next token prediction become far more obvious when you're running models that can work on commodity hardware
> As someone that started using Co-work, I feel like I am going insane with the frequency that I have to keep telling it to stay on task.
Used to have the same thing happening when using Sonnet or Opus via Windsurf.
After switching to Claude Code directly though (and using "/plan" mode), this isn't a thing any more.
So, I reckon the problem is in some of these UI/things, and probably isn't in the models they're sending the data to. Windsurf for example, which we no longer use due to the inferior results.
In my experience all of the models do that. It's one of the most infuriating things about using them, especially when I spend hours putting together a massive spec/implementation plan and then have to sit there babysitting it going "are you sure phase 1 is done?" and "continue to phase 2"
I tend to work on things where there is a massive amount of code to write but once the architecture is laid down, it's just mechanical work, so this behavior is particularly frustrating.
I hope you will excuse my ignorance on this subject, so as a learning question for me: is it possible to add what you put there as an absolute condition, that all available functions and data are present as an overarching mandate, and it’s simply plug and chug?
Recently it seems that even if you add those conditions the LLMs will tend to ignore them. So you have to repeatedly prompt them. Sometimes string or emphatic language will help them keep it “in mind”.
If found it better to split in smaller tasks from a first overall analysis and make it do only that subtask and make it give me the next prompt once finished (or feed that to a system of agents). There is a real threshold from where quality would be lost.
Yeah that happened to me with Claude code opus 4.6 1M for the first time today. I had to check the model hadn’t changed. It was weird. I was imagining that maybe anthropic have a way of deciding how much resource a user actually gets and they had downgraded me suddenly or something.
But how do you see the current thinking level and how do you change it? I’ve been clicking around and searching and adding “effortLevel”:”high” to .claude/settings.json but no idea if this actually has any effect etc.
Haha yeah I've had this happen to me too (inside copilot on GitHub). I ask it to make a field nullable, and give it some pointers on how to implement that change.
It just decided halfway that, nah, removing the field altogether means you don't have to fix the fallout from making that thing nullable.
Opus 4.6 found in my documentation how to flash the device and wanted to be clever and helpful and flash it for me after doing series of fixes. I've got used to approving commands and missed that one. So it bricked it.
Then I wrote extra instructions saying flashing of any kind is forbidden. Few days later it did again and apologised...
> Rust happens to be an extremely good tool. There
Sir (or ma’am), you stole literally the line I came to write in the comments!
To anyone new picking up Rust, beware of shortcuts (unwrap() and expect() when used unwisely). They are fine for prototyping but will leave your app brittle, as it will panic whenever things do not go the expected way. So learn early on to handle all pathways in a way that works well for your users.
Also, if you’re looking for a simpler experience (like Rust but less verbose), Swift is phenomenal. It does not have a GC, uses ARC automatically. I spent months building a layer on top of Rust that removed ownership and borrow considerations, only to realize Swift does it already and really well! Swift also has a stable ABI making it great for writing apps with compiled dynamic components such as plugins and extensions. It’s cross platform story is much better today and you can expect similar performance on all OS.
For me personally, this relegates rust for me to single threaded tasks - as I would happily take the 20% performance hit with Swift for the flexibility I get when multithreading. My threads can share mutable references, without fighting with the borrow checker - because it’s just a bad use case for Rust (one it was not designed for). A part of my work is performance critical to that often becomes a bottleneck for me. But shouldn’t be a problem for anyone else using RwLock<Arc<…>>. Anyway - they’re both great languages and for a cli tool or utility, you can’t go wrong with either.
Are you guys affiliated with Meta’s ex-CTO in any way? I remember he famously implied that LLMs hyped. The demos are very impressive. Does this use an attention based mechanism too? Just trying to understand (as a layman) how these models handle context and if long contexts lead to weaker results. Could be catastrophic in the real world!
I think in the long run, we may need something like a batch job that compresses context from the last N conversations (in LLMs) and applies that as an update to weights. A looser form of delayed automated reinforcement learning.
Or make something like LoRA mainstream for everyone (probably scales better for general use models shared by everyone).
If it’s any consolation, it was able to one-shot a UI & data sync race condition that even Opus 4.6 struggled to fix (across 3 attempts).
So far I like how it’s less verbose than its predecessor. Seems to get to the point quicker too.
While it gives me hope, I am going to play it by the ear. Otherwise it’s going to be - Gemini for world knowledge/general intelligence/R&D and Opus/Sonnet 4.6 to finish it off.
UPDATE: I may have spoken too soon.
> Fixing Truncated Array Syncing Bug
> I traced the missing array items to a typo I made earlier!
> When fixing the GC cast crash, I accidentally deleted the assignment..
> ..effectively truncating the entire array behind it.
These errors should not be happening! They are not the result of missing knowledge or a bad hunch. They are coming from an incorrect find/replace, which makes them completely avoidable!
For me it's Opus 4.6 for researching code/digging through repos, gpt 5.3 codex for writing code, gemini for single hardcore science/math algorithms and grok for things the others refuse to answer or skirt around (e.g. some security/exploitability related queries). Get yourself one of those wrappers that support all models and forget thinking about who has the best model. The question is who has the best model for your problem. And there's usually a correct answer, even if it changes regularly.
Interesting, I've had similar issues. It seems to be very clumsy when using its internal tooling. I've seen diffs where it accidentally garbled significant amounts of code, which it then had to go in and manually fix. It's also introduced bugs into features that it wasn't supposed to be touching, and when I asked it why it was making changes to I the other code, it answered that it had failed to copy-paste since large blocks of code correctly.
Yeah, I whole heartedly agree with this. Even Codex does this sometimes, although it has been consistently much better than the others at following instructions.
The problem is again that you can’t ever fully trust an agent did exactly what you asked for and in the exact manner that you had hoped.
It works just like you’re dealing with a human companion. Trust takes time to build. Over the period you realize the other individuals weaknesses and support them there.
What makes it a bit challenging right now is the pace of innovation. By the time we get used to a model’s personality, a new update comes out that alters it in unknown ways. Now you’re back to square one.
I’ve been experimenting with asking one frontier model to check on another’s work. That’s proven to be better than doing nothing. Usually they’ll have some genuinely useful feedback.
reply