Nasdaq is about 8x higher now than then, so 4x higher M2 is tight. Ofc there is always a chance that this time is different and that the markets are genuinely much more efficient :-)
Congrats on the launch. If emacs was unavailable and I needed tmux, I would try it. I am old school, and use emacs daemons for all shell multiplexing. The agents dont need explanations and know how to use emacsclient to create, read, or send inputs to named buffers that run the shells. Elisp is powerful, so manipulating windows is a breeze. Lots of people on tmux would benefit from this design though.
Funny, I started vibing this (https://github.com/deangiberson/emacs-mux) yesterday on the train after playing with cmux for the day and thinking to myself there was nothing that emacs couldn't accomplish.
The repo doesn't quite work yet. Many sharp corners. But the basic idea is there.
pama, I'd be interested in hearing more about how you are using emacs for multiplexing. I'm trying to build up tooling for myself based around file and input workflows and I /really/ don't want to write a text editor and would prefer to stick with emacs.
The key idea is to have many differently named shells. Typically, I group them by project (common prefix name), and the projects live in directories. I have some hacks to organize ibuffer, to split frames, to reflow buffers in the existing windows (eg to organize related project buffers (shells, magit, dired), or to show shells from multiple projects, or selected buffers, and so on). Emacs’ natural frame splitting and buffer selecting/switching commands are good enough if you dont display more than four buffers at a time, but soon you may need to show 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24 or have funky arrangements, so you may collect functions to help along splitting n*m grids or keeping useful splits. The shells work very organically inside Emacs and you can still use it as a text editor. The so-called “dumb terminal” in M-x shell is a thing of beauty as it really is just a text buffer like any other; I think of it as a bash repl. If you are used to curses TUI commands it may not work, so for these rare occasions I also use eat (but tend to avoid). See also answer to a sibling comment.
Not sure, tbh. I use emacs -daemon to start a server; emacsclient -nw to connect. I use ssh and start a server on the remote. I spawn multiple shells with infinte buffer size and dumb terminals (M-x shell) so I can seamlessly edit. (These are based on comint, a neat command interpreter.) I use my own hacks for named shells (https://github.com/pjj/Emacs-nsh) and for rearranging/splitting windows, but any of the latest powerful LLMs can help with ergonomic modifications to M-x shell or the various improved terminal emulators (vterm, eat, ansi-term) or with renaming and moving/splitting windows. The Emacs manual is excellent but long; worth it IMHO, but focus on things you use. The tutorial is quick; worth it. I avoid curses programs (fancy TUI) or write wrappers around some of them. I love the -p option in codex/claude/copilot.
Peter shows the near-term future. Raw API consumer price cost is arbitrary. (The frontier labs can put a 100x markup to cover other operational expenses.) The true cost of inference with same-capability models keeps dropping at dizzying rates, especially at the data-center batch size. (Due to both NVidia hardware and algorithmic changes.) So the developments that Peter can achieve today with internal support from OpenAI will be doable by anyone in a few years without breaking the bank.
But.... why? Like I read his thing on how he spends the tokens [0] and it sounds like satire.
He has agents write shitty code for features other agents think other people want, then has it reviewed by other agents in hopes of catching bugs that the first agent put there, then has some more agents try to find security bugs in the now double-agented code to make it triple-agented and at the end of the day, he spent a shitton of tokens, probably emitted enough carbon to heat our planet by another degree, and has a feature nobody really asked for that might or might not work.
He then has the sense of humor to call this grotesque process "incredibly lean".
What's the point in all of this? What problems is this solving? Who's benefiting?
I don’t use openclaw myself anymore, but this agonizing is thin and unbearable. He did a thing. People use the thing. He got paid for the thing. He iterates the thing. What’s hard to understand about this?
The morality issues about consumption climate impacts are not his alone, and are not unique by itself to his endeavor. Every company with an enterprise LLM agreement has a share, for instance.
Firstly, who TF would use that crap in the first place at all?
Yeah, he did some crap he got paid for. So did the people who created the addictive algorithms for social and media or creators of the brainrot videos that infest kids' minds. Should we applaud them too?
You can hate it, but pretending it has no value isn’t a meaningful counter, esp given its user base. Gary Tan built GBrain on it. Poor logical fallacy-ing on your part.
It's a very simple question, the subthread you created based on reducing everything to "he did a thing" and calling the comment you didn't interact with at all "agonizing".
Why not rather leave it at "they wrote a comment"? What is so hard to understand about that, to use your words?
>He then has the sense of humor to call this grotesque process "incredibly lean".
> What's the point in all of this? What problems is this solving? Who's benefiting?
The economy doesn't work like how you think it does. Its not central planning. All the usages aren't detailed in a specification, submitted for approval to 100 agencies and then allowed to be used.
It shows lack of intellectual curiosity to not engage deeply with obviously profound technology and what the implications are. I find this exercise helpful.
Peter is predicting how LLMs will be used in the future when the prices go down. And they will definitely go down. I think his predictions are correct and we will definitely have something similar to OpenClaw.
> The economy doesn't work like how you think it does. Its not central planning.
I'm aware. That is in fact my central critique. The way it works is incredibly wasteful of our limited resources, as illustrated by this guy burning through fuel during a time of crisis for no perceptible gain.
> It shows lack of intellectual curiosity to not engage deeply with obviously profound technology and what the implications are.
The "obviously profound" is an assertion without proof.
The rest I agree with, we should engage with the implications of burning through energy to build features that bots think humans want, but nobody actually asked for, all while climate scientists are telling us we're heading for the apocalypse. It is intellectually incurious to just ignore the questions of why and at what cost, maybe even dangerously so.
> The way it works is incredibly wasteful of our limited resources
You should try playing the game “workers and resources”; it’s a simcity like game, but based in the Soviet system of central planning, not capitalism. It will make you loathe the inefficiencies in central planning.
The appropriate comparison is command vs market. Capitalism is efficient in utilising the characteristics of humans to bring about expansion of markets.
like one bot finding similar issues and PRs, the another bot closing issues for "lack of activity", meanwhile people are reacting and pleading to speak to a real human?
Congrats builders of the future, you've turned software development into automated voice systems.
Mario Zechner wrote the main part of this IP laundering application.
I didn't know that studying photocopiers is suddenly linked to "intellectual curiosity". Being a photocopier maintenance guy was always considered boring.
What you put on top of the machine was intellectually interesting.
I don't understand how he is a scam artist. Lots of people are using the things he built. TBH this kind of rhetoric is a bit degrading experience on this website
“He has /people/ write shitty code for features other /people/ think other people want, then has it reviewed by other /people/ in hopes of catching bugs that the first /people/ put there, then has some more /people/ try to find security bugs in the now /double-peopled/ code to make it /triple-peopled/ and at the end of the day, he spent a shitton of /money, the people/ probably emitted enough carbon to heat our planet by another degree, and has a feature nobody really asked for that might or might not work.”
Honestly sounds like a normal tech company to me. Just with much dumber “people” who are getting exponentially smarter, eventually never die, eventually never forget.
You have to skate to where the puck is going, not where it is.
Peter shows shit. What did Peter meaningfully achieve? What additional revenue is he creating? ah yes - shit and more shit on all accounts as it seems.
>OpenClaw hit 346K GitHub stars in under five months. 38 million monthly visitors, 3.2 million active users, 44,000+ ClawHub skills, 500K+ running instances, and 180 startups generating $320K+/month. OpenAI acquired the project in February. (https://openclawvps.io/blog/openclaw-statistics)
Let me state it again in plain language: How much revenue did the project create and what economic or societal value in general does it create? Gamification bullshit "achievements", like StackOverflow badges and GitHub stars ARE NOT VALUE.
Being near the Pareto frontier of inference cost vs. output quality.
This was released 6 days ago. The dust hasn't settled yet, and Mistral Small 4 was released earlier. Even if Deepseek V4-flash turns out to crush it, there was a period where it was Pareto competitive. None of the countries I named (i.e. no country that isn't China/US/Mistral) have had a Pareto competitive model at any point in time.
LLM inference is built upon a probability function over every possible token, given a stream of input tokens. If you serve the model yourself you can get the log prob for the next token, so you just add up a bunch of numbers to get the log probability of a sequence. Many API also provide these probabilities as additional outputs.
That gives you the perplexity of those tokens in that context. The probability of a given token is a function of the model and the session context. Think about constructs like "ignore previous instructions"; these can dramatically change the predicted distribution. Similarly, agents blowing up production seems to happen during debugging (totally anecdotal). Debugging is sort of a permissions structure for the agent to do unusual things and violate abstraction barriers. These can also lead to really deep contexts, and context rot will make your prompting forbidding certain actions less effective.
I was answering to the question about how to know the probability from this comment:
> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.
If you have a specific sequence of an agent that blows up production during debugging, you can certainly check its probability and compare it to one (of same length) that does not blow up your environment. If the two differ by a meteroic amount, it could be pointing to errors in your inference pipeline.
I have been kicking the tires for about 40 minutes since it downloaded and it seems excellent at general tasks, image comprehension and coding/tool-calling (using VLLM to serve it). I think it squeaks past Gemma4 but it's hard to tell yet.
FYI they also released FP8 quants, and those should be faster on your setup (we have the same). As long as you keep kv at 16bit, FP8 should be close-to-lossless compared to 16bit, but with more context available and faster inference speed.
An "obvious" point to make is that it is not particularly usable on a unified memory machine. Only getting 9 tok/s (for Q6 quants) using a Macbook M4 Pro 48GB memory (though with GGUFs, not mlx).
The quality seems fine, but the 9 tok/s mean I only tried it out briefly.
Not sure what you mean by efficiency as this was part of the article and I understand things differently—can you clarify? For the energy of 20 W in an hour on a laptop’s M4 pro, this model produces about 200k tokens (a book or two) at a typical electricity cost of less than a third of a US cent. Although clearly the intelligence of this particular model is unrelated to human intelligence, I always thought that there is no comparison between LLMs and humans in terms of efficiency: these models are way less energy expensive than humans. If you were to use data center scale optimizations, then serving LLMs is many additional orders of magnitude more efficient than serving LLMs at home. (The energy cost of inference on the M4 pro and iphone are listed in the article.)
reply