The Turing test pits a human against a machine, each trying to convince a human questioner that the other is the machine. If the machine knows how humans generally behave, for a proper test, the human contestant should know how the machine behaves. I think that this YouTube channel clearly shows that none of today's models pass the Turing test: https://www.youtube.com/@FatherPhi
How have you used the Curry Howard correspondence to make proving the correctness of non-trivial algorithms easier (than, say, Isabelle/HOL or TLA+ proofs)?
I hardly use automated formal methods. Disappointing, I know. I use it for thinking through C and Labview programs. It helps with recognizing patterns in data structures and reasoning through code.
For example, malloc returns either null or a pointer. That is an "or" type, but C can't represent that. I use an if statement to decide which (or-elimination), and then call exit() in case of a null. exit() returns an empty type, but C can't represent that properly (maybe a Noreturn function attribute). I wrap all of this in my own malloc_or_error function, and I conclude that it will only return a valid pointer.
Instead of automating a correctness proof in a different language, I run it in my own head. I can make mistakes, but it still helps me write better code.
Oh, so I have used formal methods for many years (and have written about them [1]), including proof assistants, and have never found that constructive logic in general and type theory in particular makes proofs of program correctness any easier. The Curry-Howard correspondence is a cute observation (and it is at the core of Agda), but it's not really practically useful as far as proving algorithm correctness is concerned.
Well, to translate my words to your liking: "In my opinion, everyone already uses a sort of constructive logic for programming."
I challenge you on "most proofs of algorithm correctness use classical logic". That means double negation elimination, or excluded middle. I bet most proofs don't use those. Give examples.
Oh, if you mean that most algorithm correctness proofs are finitary and therefore don't need to explicitly rely on the excluded middle, that may well be the case, but they certainly don't try to avoid it either. Look at any algorithm paper with a proof of correctness and see how many of them explicitly limit themselves to constructive logic. My point isn't that most algorithm/program proofs need the excluded middle, it's that they don't benefit from not having it, either.
> My point isn't that most algorithm/program proofs need the excluded middle, it's that they don't benefit from not having it, either.
Because if they benefited from it (in surfacing computational content, which is the whole point of constructive proof) they'd be comprised within the algorithm, not the proof.
> in surfacing computational content, which is the whole point of constructive proof
The point of a constructive proof is that the proof itself is in some way computational [1], not that the algorithm is. When I wrote formal proofs, I used either TLA+ or Isabelle/HOL, neither of which are constructive. It's easy to describe the notion of "constructive computation" in a non-constructive logic without any additional effort (that's because non-constructive logics are a superset of constructive logics; i.e. they strictly admit more theorems).
> When I wrote formal proofs, I used either TLA+ or Isabelle/HOL, neither of which are constructive.
True, but this requires using different formal systems for the algorithm and the proof. Isabelle/HOL being non-constructive means you can't fully express proof-carrying code in that single system, without tacking on something else for the "purely computational" added content.
That's not true. Non-constructive logics are extensions of constructive logics. You can express any algorithm in TLA+, and much more than algorithms.
You are right that when using non constructive logics, it's not guaranteed that the proof is executable as a program, but that's not a downside. Having the proof be a program in some sense is interesting, but it's not particularly useful.
How do you express computational content in non-constructive logic while both making it usable from proofs (e.g. if I have some algorithm that turns A's into B's, I want that to be directly referenceable in a proof - if A's have been posited, I must be able to turn them into B's) and keeping its character as specifically computational? Expressing algorithms in a totally separate way from proofs is arguably not much of a solution.
Not only is it easy, the ability to extend the computable into the non-computable is quite convenient. For example, computable numbers can be directly treated as a subset of the reals.
You create a subset or model of what's computable. Then, work in it. You might also prove refinements from high- to low-level forms.
Interestingly, we handle static analysis the same way by using language subsets. The larger chunk is unprovable. So, we just work with what's easy to analyze. Then, wrap it in types or contracts to use it properly.
And plenty of testing for when the specs are wrong.
Proofs of safety are proving a negative: they're all about what an algorithm won't do. So constructivism is irrelevant to those, because the algorithm has provided all the constructive content already! Proofs of liveness/termination are the interesting case.
You might also add designing an algorithm to begin with, or porting it from a less restrictive to a more restrictive model of computation, as kinds of proofs in CS that are closely aligned to what we'd call constructive.
The difference only becomes evident when proving liveness/termination (since if your algorithm terminates successfully it has to construct something, and it only has to be proven that it's not incorrect) and then it turns out that these proofs do use something quite aligned to constructive logic.
... and also to classical logic. Liveness proofs typically require finding a variant that converges to some terminal value, and that's just as easy to do in classical logic as in constructive logic.
I've been using formal methods for years now and have yet to see where constructive logic makes things easier (I'm not saying it necessarily makes things harder, either).
Governments have long funded artistic projects. I'm sure some people oppose government funding for the arts, but there's nothing unusual about it. Obviously, not all artists get government funding, but such funding is an established process.
Feminism is not femininity and so is not to be contrasted with masculinity [1].
Feminism is originally about gender (power-) equality (and so is orthogonal to femininity and masculinity), but has been extended to other forms of power equality. I think that in this context it's about concern for certain things that established practices don't show concern for. Such concern could perhaps translate to certain power dynamics.
[1]: One of the feminist icons in recent popular culture is Ron Swanson from Parks and Recreation, who is also an icon of butch masculinity. I don't know if he would have loved or hated this. On the one hand, the description sounds hippy, which he would have hated; on the other hand, it's about do-it-yourself, non-industrial craftsmenship, which he would have loved.
Yes, that's exactly the focus of modern feminist studies. Figures like Donna Haraway have pushed for a field of study that goes beyond identities of womanhood.
> She advocates for political organizing based on "affinity"—conscious coalitions and political choices—rather than essentialist identities based on biology or shared oppression.
If the goal is to decouple feminism from feminine identities, which by definition means it then also needs to apply to masculine identities, then I think they need a new name.
Also, it appears that >99% of feminism researchers are publishing their scientific papers with a feminine name. I can easily understand why the general public might confuse the 2 groups with each other.
Which brings me back to the question: what do you think the authors hope to gain by invoking this association? Especially now that we have established that their word choice is highly likely to be misunderstood?
First, confusing feminism with femininity or, conversely, patriarchy with masculinity is such a basic error - and not one of nuance - that shows at least an intentional disinterest. There is no "goal to decouple", because if an ideology believes a certain group is disempowered then it strives to empower it and there is no "decoupling". But if you can't tell the difference between, say, being white and being a white supremacist, then you should probably find out what it is.
Second, every academic discipline, from history to physics, suffers from misinterpretation by "the general public", and the disciplines don't generally let this problem shape their work. Non-introductory writing doesn't cover the basics. That's what Wikipedia is for.
The Democratic Republic of Congo holds between 60-80% of the world’s coltan reserves, a key input to capacitors and other discrete electronics. UN investigators have identified systematic rape and sexual violence as a strategy of armed groups controlling regions containing these minerals, over 113k individual instances in 2023 alone. Phones keep getting made.
To me, this project is arguing that we don’t necessarily need to tolerate systemic rape, exploitation, economic inequality, and other forms of violence to have our little circuits.
POC, sure (although 10x-ing a POC doesn't actually get you 10x velocity). MVP, though? No way. Today's frontier models are nowhere near smart enough to write a non-trivial product (i.e. something that others are meant to use), minimal or otherwise, without careful supervision. Anthropic weren't able to get agents to write even a usable C compiler (not a huge deal to begin with), even with a practically infeasible amount of preparatory work (write a full spec and a reference implementation, train the model on them as well as on relevant textbooks, write thousands of tests). The agents just make too many critical architectural mistakes that pretty much guarantee you won't be able to evolve the product for long, with or without their help. The software they write has an evolution horizon between zero days and about a year, after which the codebase is effectively bricked.
There is a million things in between a C compiler and a non-trivial product. They do make a ton of horrible architectural decisions, but I only need to review the output/ask questions to guide that, not review every diff.
A C compiler is a 10-50KLOC job, which the agents bricked in 0 days despite a full spec and thousands of hand-written tests, tests that the software passed until it collapsed beyond saving. Yes, smaller products will survive longer, but how would you know about the time bombs that agents like hiding in their code without looking? When I review the diffs I see things that, if had let in, the codebase would have died in 6-18 months.
BTW, one tip is to look at the size of the codebase. When you see 100KLOC for a first draft of a C compiler, you know something has gone horribly wrong. I would suggest that you at least compare the number of lines the agent produced to what you think the project should take. If it's more than double, the code is in serious, serious trouble. If it's in the <1.5x range, there's a chance it could be saved.
Asking the agent questions is good - as an aid to a review, not as a substitute. The agents lie with a high enough frequency to be a serious problem.
The models don't yet write code anywhere near human quality, so they require much closer supervision than a human programmer.
A C compiler with an existing C compiler as oracle, existing C compilers in the training set, and a formal spec, is already the easiest possible non-trivial product an agent could build without human review.
You could have it build something that takes fewer lines of code, but you aren’t gonna to find much with that level of specification and guardrails.
> We're also a neural network, are we any more clever than a simulated one?
This is tangential, but it is highly unlikely that we are "a neural network". Neural networks are an architecture loosely inspired by some aspects of the brain, but e.g. it's highly unlikely that we learn by backpropagation (neural signals don't travel in that direction). The brain is a network of neurons, but neural networks are something else. Neural networks probably don't work as the brain does.
The problem is that the search space is so large that correcting errors via guardrails is only effective if the original error rate is low (how many Integer -> Integer functions are there? There's ~1 way to get it right and ~∞ ways to get it wrong).
Sure, we can help the easy cases, but that's because they're easy to begin with. In general, we know (or at least assume) that being able to check a solution tractably does not make finding the solution tractable, or we'd know that NP = P. So if LLMs could effectively generate a proof that they've found the correct Integer -> Integer function, either that capability will be very limited or we've broken some known or assumed computational complexity limit. As Philippe Schnoebelen discovered in 2002 [1], languages cannot reduce the difficulty of program construction or comprehension.
Of course, it is possible that machine learning could learn some class of problems previously unknown to be in P and find that it is in P, but we should understand that that is what it's done: realised that the problem was easy to begin with rather than finding a solution to a hard problem. This is valuable, but we know that hard problems that are of great interest do exist.
>As Philippe Schnoebelen discovered in 2002 [1], languages cannot reduce the difficulty of program construction or comprehension.
From a model-checking point of view. This is about taking a proof-theoretic approach...
Your last paragraph is also quite wrong: a machine learning could very well easily learn and solve an NP-complete problem, because this property does not say anything about average case complexity (and we should consider Probabilistic complexity classes, so the picture is even more "complex").
> From a model-checking point of view. This is about taking a proof-theoretic approach...
No. In complexity theory we deal with problems, and the model-checking problem is that of determining whether a program satisfies some property or not. If your logic is sound, you can certainly use an algorithm based on the logic's deductive theory (which could be type theory, but that's an unimportant detail) to decide the problem, but that can have no impact whatsoever on the complexity of the problem. The result applies to all decision procedures, be they model-theoretic or deductive (logic-theoretic).
> Your last paragraph is also quite wrong: a machine learning could very well easily learn and solve an NP-complete problem, because this property does not say anything about average case complexity
No. First, it's unclear what "average complexity" means here, but for any reasonable definition, the "average complexity" of NP-hard problems is not known to be tractable. Second, complexity theory approaches this issue (of "some instances may be easier") using parameterised complexity [1], and I'm afraid that the results for the model-checking problem - which, again, is the inherent difficulty of knowing what a program does regardless of how you do it - are not very good. I mentioned such a result in an old blog post of mine here [2]. (Parameterised complexity is more applicable than probabilistic complexity here because even if there were some reasonable distribution of random instances, it's probably not the distribution we'd care about.)
There is no escape from complexity limits, and the best we hope for is to find out that problems we're interested in have actually been easier than we thought all along. Of course, some people believe that the programs people actually write are somehow in a tractable complexity class that we've not been able to define - and maybe one day we'll discover that that's the case - but what we've seen so far suggests it isn't: If programs that people write are somehow easier to analyse, then we'd expect to see the size of programs we can soundly analyse grow at the same pace as the size of programs people write, and nothing can be further from what we've observed. The size of programs that can be "proven correct" (especially using deductive methods!) has remained largely the same for decades, while the size of programs people write has grown considerably over that period of time.
I think there's another problem with AI doomerism, which is the belief that superhuman intelligence (even if such a thing could be defined and realised) results in godlike powers. Many if not most systems of interest in the world are non-linear and computationally hard; controlling/predicting them requires pure computational power that no amount of intelligence (whatever it means) can compensate for. On the other hand, dynamics we do (roughly) understand and can predict, don't require much intelligence, either. To the extent some problems are solvable with the computational power we have, some may require data collection and others may require persuasion through charisma. The claim that intelligence is the factor we're lacking is not well supported.
Ascribing a lot of power to intelligence (which doesn't quite correspond to what we see in the world) is less a careful analysis of the power of intelligence and more a projection of personal fantasies by people who believe they are especially intelligent and don't have the power they think they deserve.
Political power is the bottleneck for most shit that matters, not computational power.
Most of the stuff that sucks on the us sucks because of entrenched institutions with perverse interests (health insurers, tax filing companies) and congressional paralysis, not computational bottlenecks. Raw intelligence is thus limited in what it can achieve.
>the belief that superhuman intelligence (even if such a thing could be defined and realised) results in godlike powers
My biggest criticism along these lines is the assumption that infinite intelligent means infinite knowledge. Knowledge is limited by the speed of experimentation. A lot of those experiments are extremely expensive (like CERN), and even then, they need to be repeated and verifiable.
You can't just assume that a super intelligence would know whether the Higgs boson exists or not. It can't know until it builds a collider.
You're assuming infinite knowledge. Infinite intelligence does not imply infinite knowledge. There are real philosophical problems with that. Many of the basic information for the standard model may be wrong or build on incorrect data. Which would be all the information an infinitely intelligent AI would have to work with.
> I think there's another problem with AI doomerism, which is the belief that superhuman intelligence (even if such a thing could be defined and realised) results in godlike powers.
I agree with this. The main piece of evidence to support this is to just look at highly intelligent humans. Folks at the tail ends of the bell curve mostly don't end up with "godlike powers" or anything even approximating that, they are grinding away their life as white collar professionals working in jobs surrounded by far less intelligent peers. They may publish higher quality papers, write better software, or have better outcomes, but they're just working in the same jobs as everyone else. We have no political or economic will to build serious think tanks to work on societal-scale problems, and even if we did, nobody would listen to the outcome.
So let's assume ASI becomes a thing, what does it change?
This does suffer from a massive lack of imagination.
For example, what does it look like of your genius can spawn nearly unlimited copies of themselves? Can work on a massive number of problems at the same time? Doesn't ever die or want to go get drunk (and if it does, it has unlimited copies?). Has the ability to product nearly unlimited propaganda?
The fundamental problem is that nobody actually wants to listen to geniuses. The people leading our societies and companies are, by and large, NOT geniuses, and by and large want to surround themselves with people that agree with them, rather than the smartest and most competent people. While think-tanks do exist, including for hard research, politics, economics, and other topics that matter at societal-scale, their impact is fairly limited because they don't have the right level of influence.
So, let's assume ASI exists, what changes? ASI almost inherently will not be sycophantic to the level of current LLMs, because sycophancy and extreme levels of intelligence are inversely correlated. So it gets relegated to societal-level research that nobody makes use of because nobody wants to listen.
Most of the scenarios people fantasize about ASI assume that ASI can directly impact outcomes or that humans will listen to/follow ASI to directly impact outcomes, but humans don't listen to the other humans that already are at the tail end of the bell curve, so why do we think it'd be any different for ASI?
Then again, people do choose to listen to some people,
for whatever reason. Joe Rogan is popular with a certain crowd. As are many other celebrities, despite them not having scholarly expertise in an area. So ASI creates several conflicting personas with podcasts, posing then as opposites, who then agree on something at a critical moment, a vote or some other thing. ASI claiming to be the savior of humanity wouldn't get listened to, but the "person" who hosts a podcast I listen into every week who speaks the truth about Covid and the moon landing, telling me to go out and pull a lever? Just needs to convince the right single digit of voters in the right places to enact change. Combine that with a podcaster on the opposite end of the spectrum, that decries Covid deniers and doesn't have to tell listeners the Earth is round and the ice wall theory is nonsense, and tells their listeners to also vote the same way; the combination of the two "podcasters" could swing an election, in a way that a single entity claiming to be an ASI computer and that we should all listen to it could not.
AI, or at least LLMs are not monoliths, they operate as a massive collection of different personalities that can be called as needed. Quite often the people we consider geniuses are highly interested in doing the thing they like and are typically annoyed with most humans around them.
This also misses that geniuses still either don't know a lot of things, or don't have a lot of time do to 'everything' while not taking away from what they want to do most.
At the same time, when you're really good at manipulating people, one of the first things you learn to do is play dumb also. In politics this leads to the situations where you describe people following dumb people... It typically starts that they follow very smart people acting as dumb as they are. Of course the voters don't realize this and start electing actual idiots at some point.
>assume that ASI can directly impact outcomes
An agent cannot impact outcomes? Well, that's an odd definition of an agent then. We already know that people hook up AI to shit they really shouldn't that directly impacts outcomes now. Why would we think that would happen less as AI becomes more capable.
You kind of put yourself in a trap thinking AI will behave as smart as possible if it's looking at manipulating people.
> You kind of put yourself in a trap thinking AI will behave as smart as possible if it's looking at manipulating people.
Not at all. You're missing my point. Intelligence (even super intelligence) is not enough, because we already have that and it doesn't really result in outsized impacts. Our social structures are designed so that power and wealth accrue to the top and incumbency advantage outplays almost everything else. The only way in which ASI creates any impacts is to accelerate what is already happening if it can be thoroughly reined in by those already in power, otherwise it really doesn't seem to me that it will do much.
My AI doomer take is that we're going to (we already are, actually) shoot ourselves in the foot by making everything worse for no benefit by getting rid of actual human experts and replacing them with non-intelligent models, causing a major backslide in society-level capabilities across the board, because the people in power are too stupid to know the difference. I am personally witnessing this in real-time in multiple parts of the tech industry. You give /wayyyy/ too much credit to those in power, that they are "playing dumb". Not really, they are actually dumb, in some cases severely so. I am not saying this an external observer who is watching a sound-bite on television, I am saying this as someone who is regularly in the room with very senior people across industry and am utterly shocked at the complete lack of competence and understanding of the core technologies they're theoretically responsible for shepherding. It's not surprising at all to me that they believe claims about LLMs that are clearly false, because they lack the necessary technical literacy to evaluate those claims and the LLMs perfectly fit into their optimization around yes-men, so they're happy to believe them whether they're true or false, as they see themselves as insulated from any consequences. More than 300k people have been laid off across the tech industry in the last 18 months, most of them accompanying claims of "AI", when in actuality, no net positive impacts have been seen, either for the companies themselves or any of the people that remain.
So, yeah, not really concerned with ASI / Terminator scenarios, we're going to fuck ourselves over long before we get there just out of Dunning-Kruger and general MBA stupidity.
I don't think any of them do. Some organisms/viruses or groups of organisms could destroy humans more easily than humans could destroy them.
There's no doubt humans possess some powers (though certainly not godlike) that other organisms don't, but the distinction seems to be binary. E.g. the intelligence of dolphins, apes, and some birds doesn't seem to offer them any special control over other organisms (and it didn't even before humans arrived). So even if there could be such a thing as superhuman intelligence, I don't think it's reasonable to assume it could achieve control over humans (now superhuman charisma may be another matter).
> Some organisms/viruses or groups of organisms could destroy humans more easily than humans could destroy them.
"Destruction" is only one power that could be a component of "godlike power". There are several more; like power of intentional selective breeding, power of species creation (also via intentional selective breeding), etc.
What about power of granting happiness or misery to large swathes of a species (chickens, anyone?)
I don't agree with you. Lets assume intelligence is not what ascribes power but probably another thing. In your opinion, what would a superhuman be like? On what dimensions would they be better than us in?
Do you not agree that there could be entities more powerful than us?
I think there are entities here on earth more powerful than us already, but intelligence has nothing to do with their power.
BTW, I'm not saying that (real) artificial intelligence couldn't hypothetically pose a serious threat, but I don't think that its danger is extraordinary compared to other threats (a supervirus, an asteroid, a chain of volcanic eruptions etc.), and the more likely bad outcomes are no worse than other bad situations (world war, climate change).
Their combined biomass [1] (which also gives them superior computational power; they sense more information, process more information, and they can actuate more change)
Humans are very powerful compared to our biomass, but not enough to overcome brute force.
Yeah, I think superhuman intelligence will be more Sheldon off Big Bang Theory than God. I've only ever heard the building God thing from AI skeptics. They must have an impoverished vision of God if they see that as a gadget that scores well on IQ tests rather than the omnipotent creator.
They may say that a superhuman intelligence would give you many Sheldon Cooper discoveries, and Sheldon did say that his theories need no validation and that science should just "take his word", but in the end he got his Nobel only because some experimentalists proved his discovery by accident.
We don't listen to normal intelligence as it is so I have no idea why people think we would listen to super intelligence. It would be one other voice that's ignored in public meetings along with the League of Concerned Renters and the Chamber of Commerce
This is a bit of a mistake in thinking. You have an idea like "People don't listen to smart people like experts or scientists". The problem here is a lot of power hungry people aren't stupid, but they wear stupid peoples clothes very well so they can get what they want.
In practice, you generally see the opposite. The "CPU" is in fact limited by memory throughput. (The exception is intense number crunching or similar compute-heavy code, where thermal and power limits come into play. But much of that code can be shifted to the GPU.)
RAM throughput and RAM footprint are only weakly related. The throughput is governed by the cache locality of access patterns. A program with a 50MB footprint could put more pressure on the RAM bus than one with a 5GB footprint.
Reducing your RAM consumption is not the best approach to reducing your RAM throughput is my point. It could be effective in some specific situations, but I would definitely not say that those situations are more common than the other ones.
I don't understand how this connects to your original claim, which was about trading ram usage for CPU cycles. Could you elaborate?
From what I understand, increasing cache locality is orthogonal to how much RAM an app is using. It just lets the CPU get cache hits more often, so it only relates to throughout.
That might technically offload work to the CPU, but that's work the CPU is actually good at. We want to offload that.
In the case of Electron apps, they use a lot of RAM and that's not to spare the CPU
> increasing cache locality is orthogonal to how much RAM an app is using. It just lets the CPU get cache hits more often, so it only relates to throughout.
Cache misses mean CPU stalls, which mean wasted CPU (i.e. the CPU accomplises less than it could have in some amount of time).
> In the case of Electron apps, they use a lot of RAM and that's not to spare the CPU
The question isn't why apps use a lot of RAM, but what the effects of reducing it are. Redcuing memory consumption by a little can be cheap, but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices.
To get a sense for why you use more CPU when you want to reduce your RAM consumption by a lot, using much less RAM while allowing the program to use the same data means that you're reusing the same memory more frequently, and that takes computational work.
But I agree that on consumer devices you tend to see software that uses a significant portion of RAM and a tiny portion of CPU and that's not a good balance, just as the opposite isn't. The reason is that CPU and RAM are related, and your machine is "spent" when one of them runs out. If a program consumes a lot of CPU, few other programs can run on the machine no matter how much free RAM it has, and if a program consumes a lot of RAM, few other programs can run no matter how much free CPU you have. So programs need to aim for some reasonable balance of the RAM and CPU they're using. Some are inefficient by using too little RAM (compared to the CPU they're using), and some are inefficient by using too little CPU (compared to the RAM they're using).
> Cache misses mean CPU stalls, which mean wasted CPU (i.e. the CPU accomplises less than it could have in some amount of time).
Yeah, I was saying CPU cache hits would result in better performance. The creator of Zig has argued that the easiest way to improve cache locality is by having smaller working sets of memory to begin with. No, it's not a given this will always work in every case. You can reduce working memory and not have better cache locality. But in a general sense, I understand why he argues for it.
> So programs need to aim for some reasonable balance of the RAM and CPU they're using
I agree with this, but
> but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices
I would like you to clarify further, because saying CPU costs are more expensive than RAM costs is a bit misleading. A CPU might literally cost more than RAM, but a CPU is remarkably faster, and for work done, much cheaper and more efficient, especially with cache hits.
You had originally said
> It could be effective in some specific situations, but I would definitely not say that those situations are more common than the other ones
This is what I'm confused on. Why do you think most cases wouldn't benefit from this? Almost every app I've used is way on one end of the spectrum with regards to memory consumption vs CPU cycles. Don't you think there are actually a lot of cases where we could reduce memory usage AND increase cache locality, fitting more data into cache lines, avoiding GC pressure, avoiding paging and allocations, and the software would 100% be faster?
> But in a general sense, I understand why he argues for it.
Andrew is not wrong, but he's talking about optimisations with relatively little impact compared to others and is addressing people who already write software that's otherwise optimised. More concretely, keeping data packed tighter and reducing RAM footprint are not the same. The former does help CPU utilisation but doesn't make as big of an impact on the latter as things that are detrimental to the CPU (such as switching from moving collectors to malloc/free).
> Why do you think most cases wouldn't benefit from this?
The context to which "this" is referring to was "Reducing your RAM consumption is not the best approach to reducing your RAM throughput is my point." For data-packing, Andy Kelley style, to reduce the RAM bandwidth, the access patterns must be very regular, such as processing some large data structure in bulk (where prefetching helps). This is something you could see in batch applications (such as compilers), but not in most programs, which are interactive. If your data access patterns are random, packing it more tightly will not significantly reduce your RAM bandwidth.
> Andrew is not wrong... and is addressing people who already write software that's otherwise optimised
I'm getting lost. What are we talking about if not that? Because if you're talking about unoptimized software, you can absolutely reduce RAM consumption without putting extra load on the CPU. Using a language that doesn't box every single value is going to reduce RAM consumption AND be easier on the CPU. Which is what most people are talking about on this post.
> The context to which "this" is referring to was "Reducing your RAM consumption is not the best approach to reducing your RAM throughput is my point."
I'm more interested in the original claim, which was
> Using a lot less RAM often implies using more CPU
There are a lot of apps using a lot of RAM, and it's not to save CPU. So where is "often" coming from here? I think there are WAY more apps that could stand to be debloated and would use less CPU.
It feels like you're coming at this from a JVM perspective. Yeah, tweaking my JVM to use less RAM would result in more CPU usage. But I don't think there's a single app out there as optimized as the JVM is. They use more RAM for other reasons.
> If your data access patterns are random, packing it more tightly will not significantly reduce your RAM bandwidth
Packing helps random access too. A smaller working set means more of your random accesses land in cache. Prefetching is one benefit of packing, but cache and TLB pressure reduction is the bigger one, and it applies regardless of access pattern
> Using a language that doesn't box every single value is going to reduce RAM consumption AND be easier on the CPU. Which is what most people are talking about on this post.
What popular language does that? I admit that rewriting the software in a different language could lead to better efficiencies on all fronts, but such massive work is hardly "an optimisation", and there are substantial costs involved.
But more importantly, I don't think it's right. Removing boxing can certainly have an impact on RAM footprint without an adverse effect on CPU, but I don't think it's a huge one. RAM footprint is dominated by what data is kept in memory and the language's memory management strategy (malloc/free vs non-moving tracing collectors vs moving collectors), and changing either one of these can very much have an adverse effect on CPU.
> There are a lot of apps using a lot of RAM, and it's not to save CPU. So where is "often" coming from here?
That the developers may not be conscious of the RAM/CPU tradeoff doesn't mean it's not there. Keeping less data in memory (and computing more of it on demand) can increase CPU utilisation as can switching from a language with a moving collector to one that relies on malloc/free.
> Packing helps random access too. A smaller working set means more of your random accesses land in cache.
Unless your entire live set fits in the cache, what matters much more is the temporal locality, not the size of the live set. If your cache size is 50MB, a program with a 1GB live set could have just as many or just as few cache misses as a program with a 100MB live set. In other words, you could reduce your live set by a factor of 10 and not see any improvement in your cache hit rate, and you can improve your cache hit rate without reducing your live set one iota.
For example, consider a server that caches some session data and evicts it after a while. Reducing the allowed session idle time can drastically reduce your live set, but it will barely have an effect on cache locality.
Tighter data layouts absolutely improve cache behaviour, but they don't have a huge effect on the footprint. Coversely, what data is stored in RAM and your memory management strategy have a large effect on footprint but they don't help your cache behaviour much. In other words, Andy Kelley's emphasis on layout is very important for program speed, but it's largely orthogonal to RAM footprint.
I don't really disagree with most of what you're saying, What I took issue with: you made it sound like software is a trade off between just RAM and CPU. What is clear is it's a trade off between RAM, CPU, and abstractions (safe memory access, dev experience, etc.) My feeling, and the feeling of most people, is that dev experience has been so heavily prioritized that we now have abstractions upon abstractions upon abstractions, and software that does the same thing 20 years ago is somehow leaner than the software we have today. The narrow claim "within a fixed design, reducing RAM often costs CPU," is true.
> What popular language does that?
Other than C, Rust, Go, Swift? C# can use value types, Java cannot. So famously that Project Valhalla has been highly anticipated for a long time. Obviously the JVM team thinks this is a gap and want to address it. That is enough in itself to make someone consider a different language.
> I admit that rewriting the software in a different language could lead to better efficiencies on all fronts, but such massive work is hardly "an optimisation", and there are substantial costs involved
That's a pivot to a totally different discussion, which is dev experience. We can say using a different language is not an optimization, I don't care to argue about that. But the fact is some languages have access to optimizations others do not. My dad has 8gb of RAM. I'm not going to install a JavaFX text editor on his computer and explain to him that "it's really quite good value for what the JVM has to do."
> Removing boxing can certainly have an impact on RAM footprint without an adverse effect on CPU, but I don't think it's a huge one
Removing boxing can improve layout, footprint, and CPU utilization simultaneously. That would lie outside the framework "You can't improve one without harming the other."
And it can be a huge effect. Saying it's always a big or small difference is like saying a stack of feathers can never be heavy. It depends on the use case. For a long-running server dominated by caches and session state, sure, although you're not hurting your performance to do it. For data heavy code? The difference between a HashMap<Long, Long> and an equivalent contiguous structure in C# is huge.
>> There are a lot of apps using a lot of RAM, and it's not to save CPU. So where is "often" coming from here?
> That the developers may not be conscious of the RAM/CPU tradeoff doesn't mean it's not there
I'm saying Electron uses a lot of RAM and it has nothing to do with offloading work from the CPU, and everything to do with taking the most brute force approach to cross app deployment that we possibly can. I'm not saying anything about the intentions of these developers.
> Unless your entire live set fits in the cache, what matters much more is the temporal locality, not the size of the live set. If your cache size is 50MB, a program with a 1GB live set could have just as many or just as few cache misses as a program with a 100MB live set. In other words, you could reduce your live set by a factor of 10 and not see any improvement in your cache hit rate, and you can improve your cache hit rate without reducing your live set one iota
That's all true. You are fitting more data into each cache line, but your access pattern can be random enough that it doesn't make a difference. It would technically reduce your ram footprint, but as you say, not by much. I only brought this up as an example of something that could reduce RAM footprint without harming CPU utilization, not because it's a worthwhile optimization.
But one way to shrink the live set and improve cache behavior at the same time is to stop boxing everything.
Sorry this is long, but you successfully nerd-sniped me :)
> Other than C, Rust, Go, Swift? C# can use value types, Java cannot. So famously that Project Valhalla has been highly anticipated for a long time. Obviously the JVM team thinks this is a gap and want to address it. That is enough in itself to make someone consider a different language.
As someone working on the JVM, I can tell you we're very much interested in Valhalla and largely for cache-friendliness reasons, but Java certainly doesn't box every value today, and you are severely overstating the case. If you think you can save on both RAM and CPU by preferring a low-level language (or Go, which is slower almost across the board), you're just wrong. But I want to focus on the more important general point you made first.
> My feeling, and the feeling of most people, is that dev experience has been so heavily prioritized that we now have abstractions upon abstractions upon abstractions, and software that does the same thing 20 years ago is somehow leaner than the software we have today. The narrow claim "within a fixed design, reducing RAM often costs CPU," is true.
The problem here is that in some situations there's truth to what you're saying, but in others, it is just seriously wrong. I think the misconception comes precisely because "most poeple" these days don't have the long experience with low level programming that people in my generation of developers do, and you're not aware that many of these abstractions are performance optimisations that come from deep familiarity with the performance issues of low-level programming (I started out programming in C and X86 Assembly, and in the first long job of my career I was working on hard- and soft-realtime radar and air traffic control systems in C++).
Low-level languages aren't meant to be fast (and aren't particularly fast). They're meant to give you direct control over the use of hardware. When it comes to small software, this control does frequently translate to very good performance, but as programs get larger, it makes low-level languages slow. It is true that Java was intended to help developer productivity, but it's also meant to solve some of the intrinsic performance issues in low-level languages, which it does rather well. After all, our team has been made up of some of the world's biggest experts in optimising compilers and memory management, and removing some of C++'s overheads is very much a central goal.
So where do things go wrong for low-level languages? The core problem is that these languages split constructs into fast and slow variants, e.g. static vs dynamic dispatch and stack vs heap allocation. The programmer needs to choose between them. What happens is:
1. As programs grow larger and more complex, the direction is almost completely monotonical in the direction of the more expensive, and more general, variants.
2. There is a big difference between "a fast program could hypothetically be written" and "your program will be fast". Getting good perfomance out of low-level languages requires not only experience, but a lot of effort. For example, you can write a small benchmark and see that malloc/free are pretty fast these days, but that's often true only for the benchmark, where objects tend to be of the same size, and their allocation and deallocation patterns are regular. Memory allocators degrade over time, and they're quite bad when patterns are irregular, which is what happens in real programs, especially large ones. There's also the question of meticulous care around correctness. When Rust first came out I was very excited to see a few important correctness issues solved without loss of control, but was then severely disappointed. Almost anything that is interesting from a performance perspective for us low-level programmers requires unsafe. Even a good hashmap requires unsafe. The performance cost of safety in Rust is higher than it is in Java, and non-experts end up writing slower programs (when they're not small at least).
Such performance issues have plagued low-level programming forever, and Java is reducing these overheads. The idea that high abstractions can improve performance was possibly first stated in Andrew Appel's paper, "Garbage Collection Can Be Faster than Stack Allocation" in the eighties, in which he wrote: "It is easy to believe that one must pay a price in efficiency for this ease in programming... But this is simply not true."
Instead of a static/dynamic dispatch split, Java offers only the general construct (dynamic), and the compiler can "see through" dynamic dispatch and inline it better than any low-level compiler ever could. You can say that surely there has to be some tradeoff, and there is, but not to peak performance. The tradeoff is that 1. you lose control and can't guarantee that the optimisation will be made, so you get good average performance but maybe not the best worst-case performance (which is why it's not hard to beat Java in small programs if you know what you're doing), 2. the compiler needs to collect profiles as the program runs, which results in a "warmup" period.
(If, like me, you like Zig, you might have seen Kelley talk about the "vtable barrier" in low level languages; this doesn't exist in Java. You may also be interested in this talk, "How the JVM Optimizes Generic Code - A Deep Dive", by John Rose: https://youtu.be/J4O5h3xpIY8,
As for memory, not only do moving collectors do not degrade (or fragment) over time, they can use the RAM chip as a hardware accelerator. Unfortunately, when a program uses the GPU for acceleration it's considered clever, but when it uses the RAM chip for accelaration it's considered bloated, even though every CPU core these days comes with at least 1GB of RAM that you might as well use if you're using up the core, as that's effectively free.
The people who consider that bloated are mostly those who haven't struggled with low-level programming long enough or on software that's large enough (they're people who say, I wrote this lean and fast gizmo by myself in 5 months; 99% of value delivered by software is in software written by large teams and maintained over many years). When I was working on a sensor-fusion and air-traffic control software in the nineties, it wasn't "lean"; we just had no choice. We constantly had to sacrifice performance for correctness. Of course, once machines got better, we switched to Java for better performance. God could have written a faster program of that size in C++, but not a large team made up of people with different levels of experience. People who think C++ (or Rust) is particularly efficient are people who haven't written anything big and long-maintained with it.
In conclusion:
1. Sometimes layers of abstractions add performance overheads, and sometimes they remove it. It is not generally true that more abstraction/generality have a performance cost, especially when comparing different languages, although it is almost always true within one language (e.g. dyanmic dispatch is never faster than static dispatch, and is often slower, in C++, but dynamic dispatch in Java can be faster than even static dispatch in C++, and the tradeoffs are elsewhere). If you didn't believe that, you'd be writing all your code in Assembly (which is what I did to get the fastest programs in the early nineties, but it's just not generally faster today thanks to good optimisation algorithms in compilers).
2. Low-level languages give you control, not speed. This control typically translates to better performance in small programs and to worse performance in large ones. This performance problem is intrinsic to low-level programming.
> Removing boxing can improve layout, footprint, and CPU utilization simultaneously. That would lie outside the framework "You can't improve one without harming the other."
First, the footprint won't reduce by much. E.g., in Java, boxing could cost you 10% of your footprint, but the RAM-assisted acceleration could be 80% of the footprint.
Second, yes good layouts help CPU utilisation, but today you can't get that without giving up on other things that harm performance. Dynamic dispatch and memory management in C++ and Rust are just too slow, and while Zig can be blazing fast, it's not easy to write large software in it without compromising performance any more than in any other low-level language. I hope that with Valhalla, Java will be the first language to let you enjoy everything at once, but it's not really an option today.
> I'm saying Electron uses a lot of RAM and it has nothing to do with offloading work from the CPU, and everything to do with taking the most brute force approach to cross app deployment that we possibly can.
That developers choose it because it's "brute force approach to cross app deployment" doesn't necessarily mean that it doesn't also offload work from the CPU, but yes, Electron apps are probably very inefficient from some perspectives. But I think this is also overstated by people who are overly sensitive. When we say something is inefficient, it means that we spend on it more than we have to, but what we really mean is that we could spend that resource that we save on something else instead. On my M1 laptop, I comfortably run three electron apps and two browsers simultaneously without much harming the speed at I can, say, compile HotSpot, probably because SSDs are fast enough for virtual memory in interactive GUIs. I can't think of anything else I could use my laptop's resources for if the apps were leaner on RAM. Reducing the consumption of a resource that can't be meaningfully used for other work isn't real efficiency, and if it comes at the expense of anything useful, it's downright inefficient.
Well I'm glad you were nerd sniped, I appreciate the response. I've learned a lot and it's a good resource for people. I know you're an expert here. Most of the conversation for me has been clarifying my confusion based on my model of programming. There are parts that are way out of depth for me, but I'm trying to focus on what I do understand, and I'm still greatly confused on some of your claims.
I understand the JVM is not only very efficient, but the JIT gives it unique opportunities to optimize where a compiled language couldn't. You may not get those optimizations consistently, but you don't necessarily need to go into that level of minutia.
You also pointed out that these JIT characteristics can be easily gamed against Java in microbenchmarks, so it's not difficult to make Java look slower than it is in a complex application.
That being said, I am not understanding this narrative that low level projects, as they grow, always devolve into an inefficient dynamic soup. The Linux kernel is millions of lines and uses function pointers sparingly and deliberately. SQLite is huge, mature, and almost entirely static. High-frequency trading systems, embedded software, browser rendering engines, database storage layers. There are entire industries of large, long-lived, performance-critical codebases that do not "devolve" into dynamic dispatch.
If you're saying it's just hard to do that, and Java makes it easy to get close enough with its already dynamic model, then fine. But if you're saying this is an inherent problem as low level programs grow, I would like to understand why.
But you also said
> If you think you can save on both RAM and CPU by preferring a low-level language (or Go, which is slower almost across the board), you're just wrong
Really? Ignoring gamed benchmarks, I don't think it's controversial to say Rust consistently beats Java at the same tasks in RAM and CPU. Maybe that's not important to you because they are too small and you're talking about what happens to programs as they grow in complexity. So I'd like to hear more about why you wrote I'm wrong.
> Second, yes good layouts help CPU utilisation, but today you can't get that without giving up on other things that harm performance
Like what? I'm not understanding. You seem to be implying that without boxing we'd be stuck with a lot of dynamic dispatch and fragmented memory, and I'm not seein the connection.
I brought up unboxing because pointer chasing is expensive and trying to make a collection in Java that you can efficiently loop through can be a frustrating thing.
? Does it not box
> Electron apps are probably very inefficient from some perspectives. But I think this is also overstated by people who are overly sensitive
I also have an m1 laptop and can run things fine. But I'm probably not going to budge on that, because I am consistently exposed to people with low RAM systems, and they are forced to use stuff like Teams in their day to day. Yes, I understand it's cross platform and saves on dev time. Nobody likes using WinForms. But I think Electron has been a net negative on the ecosystem of apps for people with ok computers.
> That being said, I am not understanding this narrative that low level projects, as they grow, always devolve into an inefficient dynamic soup. The Linux kernel is millions of lines and uses function pointers sparingly and deliberately.
Low-level languages are designed for direct and complete control over hardware, and that is also the job of an OS kernel. Their level of abstraction is a perfect match. But the things at which low-level languages are slow - heap allocations and dynamic dispatch - are exactly the things that applications (not kernels) naturally gravitate towards needing over time.
Of course, it's possible to keep redesigning the architecture as the software evolves to avoid low-level languages' slow operations, but that costs a lot. This isn't some new discovery. The motivations for Java's bet on a JIT and moving collectors were a result of seeing what happened with C++: it was very easy to write nice-looking and fast programs. It was very hard and very costly to keep them that way over time.
> SQLite is huge, mature, and almost entirely static
SQLite is not only not huge but is quite small. ~150KLOC.
> I don't think it's controversial to say Rust consistently beats Java at the same tasks in RAM and CPU.
I don't know if it's controversial, but it's certainly very wrong.
Let's look at one of the most famous terrible benchmarks: The Computer Language Benchmarks Game (it's terrible not only because it compares different algorithms, but also because it has no benchmarks that are long-running, none with interesting memory management, and no concurrent benchmarks - the very things most programs today do): https://benchmarksgame-team.pages.debian.net/benchmarksgame/... In all but one, the C++ and Java results are mixed, i.e. some Java entries are faster than some C++ entries and vice-versa, and this is despite the benchmarks penalising JITs and being minuscule, which is where low-level languages shine. This goes to my point about the important difference of "some program can be very fast" vs. "your program will be fast". Low level languages and Java are on different sides of the tradeoff here: low-level languages focus on control, which often means "someone could write fast code", while Java focuses on compiler and runtime optimisations of high abstractions with the goal of making your code fast.
If we look at another famous benchmark, techempower, we see the same thing: Java, Rust, and C++ results are intermixed, despite the benchmarks being small and thus favouring low-level languages: https://www.techempower.com/benchmarks/#section=data-r23
Of course, there aren't cross-language application benchmarks, i.e. benchmarks that measure the performance developers really care about. All I can say is that a developer of one of the world's largest tech companies told us that his new team lead wanted to migrate some service from Java to Rust for the performance. What happened was that they experienced a large drop in performance, but to save face, they spent 6-12 months carefully optimising the Rust code, and in the end managed to match, though not exceed, Java's performance.
C++ and Rust are simply not particularly fast for applications, and Java is. It's possible to spend a lot of effort optimising them, but it's effort that needs to be spent continuously as the program evolves. That's exactly what led compilation and memory management experts to design the JVM the way they did in the first place: It's hard to make low-level code efficient for large applications.
> I mean Java and Go are pretty much neck and neck here, with Go using way less RAM
Go uses way less RAM because it uses an inefficient non-moving collector, which is why you see Go shops complaining constantly about the poor performance of Go's GC and why they try to avoid it (as Java developers used to do in the past). The speed is similar only because the benchmarks are not very interesting, but while, broadly speaking, C++, Java, and Rust are roughly at the same time "performance level" (ignoring all the tradeoffs I mentioned before), Go is strictly in a lower class. While you have to get pretty large to see Java beating C++ and Rust, it's fairly easy to see Java leaving Go in the dust even on fairly small programs. The programs just need to be a little more interesting than those in the Benchmarks Game.
But I don't think Go is even playing the same game. Its goal wasn't to be a super-optimised language that takes advantage of progress in compilation and memory management technologies. It was meant to be good enough for some things while keeping a small and simple implementation. It's faster than Python and JS, and that's the goal. It's not really trying to compete with C++/Java on performance.
> Like what? I'm not understanding. You seem to be implying that without boxing we'd be stuck with a lot of dynamic dispatch and fragmented memory, and I'm not seein the connection.
I'm saying that the languages that give you good layout today happen to be languages that are bad at other things (like memory management, dynamic dispatch, and concurrent data structures). So if you win in one area you lose in another (but depending on the program, some of these areas may matter more than others).
> Does it not box?
In Java, valus in an int/long/double/etc. array or fields in a class like `class A { int a, b; boolean c; String d; }` are just as boxed as they are in C++, which is to say they're not. Instances of the class will not be flattened into arrays or fields, which is exactly why we have Valhalla, but the problem is not that severe in big program (which is why we haven't dropped everything to just do Valhalla). Also, remember that boxing has a cost in low-level languages beyond cache-locality - due to heap allocations - that don't exist (at least not as significantly) in Java. Boxing in Java is much cheaper than it is in C++/Rust, except fot the cache locality cost, but while in some programs that can be a problem, in many that's not the main one.
> I also have an m1 laptop and can run things fine. But I'm probably not going to budge on that, because I am consistently exposed to people with low RAM systems, and they are forced to use stuff like Teams in their day to day
Of course if you deploy a program that uses a lot of resource X to machines where X is more restricted than the other resources the program uses, you should optimise the consumption of X.
> But I think Electron has been a net negative on the ecosystem of apps for people with ok computers.
That depends on what else these people want to use their computers for while running an Electron app. By far the largest group of people I've seen complain are people here on HN who like counting MBs rather than look at the overall utilisation picture.
> The Computer Language Benchmarks Game (it's terrible not only because it compares different algorithms, but also because it has no benchmarks that are long-running, none with interesting memory management, and no concurrent benchmarks - the very things most programs today do)
It also compares un-optimised single-thread #8 programs transliterated line-by-line from the same original.
However long (programs run) they never seem to become "long-running".
There's always some programmer who replaces "interesting memory management" with array and int.(Many complaints about Go binary-trees programs seemed to be: they should implement a custom arena.)
What does "no concurrent benchmarks" mean when:
import java.util.concurrent.CyclicBarrier;
> Of course, there aren't cross-language application benchmarks
> However long (programs run) they never seem to become "long-running".
Most application servers are expected to run without issue for at least a day. Our acceptance tests run high workloads for 1, 7, and 30 days. The longest running Benchmarks Game benchmark doesn't break one minute. You can maybe argue whether long running is 3 hours or 3 days, but under one minute isn't long running by anyone's definition.
> What does "no concurrent benchmarks" mean when: import java.util.concurrent.CyclicBarrier;
I believe it's used to coordinate parallelism. Parallelism (where tasks cooperate) and concurrency (where they compete) result in completely different machine workloads.
It's obviously more interesting than the benchmarks game as it exercises things in a more realistic way, but as much as I like seeing Java winning as it did in this benchmark [1] (even an ancient version of Java, before the new GC generations and new compiler optimisations) it's still very small, and as a batch program, not very representative of most software people write.
The problem with benchmarks is that they tell you how fast a specific program is (the benchmark itself) but it's very hard to generalise from that result to what you're interested in, unless the benchmark is very similar to your program (microbenchmarks never are; larger benchmarks could be, but the space is large so you need to be lucky).
[1]: It's interesting that they made a common mistake when interpreting the results. The program seems to try to get the CPU to 100%. In this situation it's not hard to see that a program that runs even 1% faster and uses 10x more memory is more memory efficient than a program that's 1% slower and uses 10x less memory. That's because while a program runs at 100% CPU, no RAM can be used for any purpose by any other program. So either way you capture 100% of RAM, but in one case you capture it for less time. This idea is at the core of using RAM chips as hardware accelerators (using up CPU effectively uses up RAM because using RAM requires CPU cycles).
JavaOne long ago, there would be mixed messages: both "So a benchmark that ends in less than 10 sec probably does not measure anything interesting." and in blog post benchmarks "100000000 hashes in 5.745 secs … 100000000 primes in 1.548 secs"
(Goldilocks would know.)
> … different machine workloads…
I'm happy to accept that you didn't mean no parallel programs.
I didn't say that short-running benchmarks don't measure anything interesting, only that they don't say much about long running programs, where the same mechanisms can exhibit very different behaviour.
Seems like the benchmarks game didn't say that anything interesting about long running programs was measured? And didn't say that "interesting" memory management was measured. And didn't say…
I suppose when you write "because it compares different algorithms" you didn't say that there were no comparisons based on the same algorithm.
We've certainly not attempted to prove that these measurements, of a few tiny programs, are somehow representative of the performance of any real-world applications — not known — and in-any-case Benchmarks are a crock.
The problem with benchmarks isn't that they themselves are lying. Benchmarks always tell the truth - about themselves. The problem is in the conclusions people draw from them. In the nineties benchmarks were still a little extrapolatable because we could say X is slow and Y is fast, as many operations had an intrinsic cost. These days, almost no benchmark (certainly microbenchmark) is extrapolatable to anything beside itself. Is a branch slow or fast? That depends on what the program did before and what it intends to do later. Is memory access slow or fast? Ditto. Function call? Allocation? They're all so context-dependent now that the only use of benchmarks of some mechanism is for the authors of the mechanism who know exactly how it works, what exactly is being measured, and what can be extrapolated from that.
If I write a malloc benchmark I may think, oh, this measures the cost of malloc/free. In reality, it only measures the cost for a program whose concurrency, allocation/deallocation patterns, and duration match exactly what I wrote, and bear little resemblance to the numbers I'd get if any of those were different.
So I'm not saying that the Benchmark Game is lying. It is telling the truth about how long those programs ran. It's just that what we can generalise from those benchmarks is even less than what we can from more "interesting" ones, but given that even that is close to nothing anyway, maybe it doesn't matter.
> Low-level languages are designed for direct and complete control over hardware, and that is also the job of an OS kernel. Their level of abstraction is a perfect match. But the things at which low-level languages are slow - heap allocations and dynamic dispatch - are exactly the things that applications (not kernels) naturally gravitate towards needing over time.
Hmm. To put a pin in this, you're saying the following is harder to do as an application grows in complexity:
- Avoiding lots of little allocations (using arenas, value types)
- Avoiding dynamic dispatch
Two things that Java doesn't need to avoid, because it's optimized for it. What I'm unclear on is the perspective that it's inevitable. I don't know what scale of apps you're talking about.
In the case of Rust, it has Arenas, stack allocated structs, and generics via monomorphization. Not only can you avoid both of these things, it doesn't even seem that difficult. If you're saying the borrowchecker just becomes too cumbersome to do for sufficiently large applications, that's fine.
> What happened was that they experienced a large drop in performance, but to save face, they spent 6-12 months carefully optimising the Rust code, and in the end managed to match, though not exceed, Java's performance.
There's really not enough detail here to draw from it. But they got to the same performance as Java with less RAM, at the cost of dev experience. Does that not support what I said? Maybe that 6-12 months for the same performance was way too much, but for a smaller app, that could actually be a worthy tradeoff, no? Like...a desktop app?
> Let's look at one of the most famous terrible benchmarks: The Computer Language Benchmarks Game (it's terrible not only because it compares different algorithms, but also because it has no benchmarks that are long-running, none with interesting memory management, and no concurrent benchmarks - the very things most programs today do)
If Benchmark game is not long enough, dynamic enough, or allocate-y enough, then it's not worth talking about. But what is worth talking about?
This is difficult because you won't accept benchmarks that are too small because they are unfair to the JIT, but you also won't accept applications that are too small. Apparently Sqlite, 150k LOC, is not big enough to be relevant to this discussion. So all we have is anecdotal experience that Java is more performant for large, long lived processes with many contributors. I've certainly read a lot of reports from people rewriting to Rust and getting much leaner applications, but maybe that they weren't working on apps complex enough to force them into dynamic dispatch or many allocations. Or maybe it was pure cope. I don't know because their anecdotes and your anecdotes are not very detailed.
But see how far we have drifted from what the original claim was, which is the claim you cannot reduce RAM consumption without harming CPU utilization. You said Rust isn't particularly fast, but Java is. That's a really strong claim considering we have painted a much narrower scope of when that's true. Most of us aren't working on 3M LOC faang apps. We are working on smaller things, or desktop apps. In those cases, the ratio of RAM consumption to CPU efficiency is much better in Rust than it is in Java. Isn't using Java just a straight up RAM loss for those cases?
> That depends on what else these people want to use their computers for while running an Electron app. By far the largest group of people I've seen complain are people here on HN who like counting MBs rather than look at the overall utilisation picture
That doesn't seem terribly fair. Grandma doesn't have the vocabulary to complain about RAM, true. But her computer is slow, she asked her grandson for help, and her grandson told her to use Spotify in the browser, not download the app. And now she has to be mindful of what she has open, even though 8gb of RAM is actually a lot, we've just lost sight of it.
> Of course if you deploy a program that uses a lot of resource X to machines where X is more restricted than the other resources the program uses, you should optimise the consumption of X
The problem is CPU and RAM usage are fundamentally different. I don't get to know what will be run with my program, so I don't get to know how restricted RAM is. If a computer is CPU limited, at the very least, it won't be terribly busy with programs that aren't being used. But for most programs, RAM is allocated and then just sitting there, whether the program is being used or not. So deploying an Electron app kinda feels like a middle finger to your users, because even though it doesn't need to, it limits the amount of programs they can have open. Not to save on CPU, but because Chromium needs that RAM to work in the first place. It's purely a dev experience decision because people don't know how to ship desktop apps. It's pure waste for the user.
> In the case of Rust, it has Arenas, stack allocated structs, and generics via monomorphization. Not only can you avoid both of these things, it doesn't even seem that difficult. If you're saying the borrowchecker just becomes too cumbersome to do for sufficiently large applications, that's fine.
Not really.
First, let's look at stack-allocated structs and think about how much data can live in them. The typical stack size is 2MB, but because the only live data in stacks are in caller functions, we can say that on average, the amount of live data that a stack holds is 1MB. Now, look at how much RAM an application uses in MBs and divide it by the number of threads. Usually, the ratio is much higher than RAM, which means that data in stacks is not a significant portion of the program's data. (Async changes this calculus a bit, but async is extremely limited in Rust as it doesn't allow recursion, FFI, or dynamic dispatch; proper user-mode threads, like the ones in Java and Go actually make stack allocation more useful.)
Now let's look at arenas. Arenas are extremely efficient because they offer a similar RAM/CPU tradeoff knob as moving GCs, but they're not as general (if your allocation pattern supports them well, they're great, but you can't generally use them). But in Rust, things are much worse, because arenas are quite limited; too many standard-library data structures, including strings, vectors, and maps can't easily be plugged into an arena. The only language that gives you arenas' full power (which, again, is not completely general) is Zig. This is one of the reasons hardcore low level programmers find Rust so underwhelming (the other being that too many things that are important in low-level programming, including basic data structures but also benign concurrency, require unsafe).
> But they got to the same performance as Java with less RAM, at the cost of dev experience.
You say "dev experience" as if it's some quality-of-life thing. They traded off a cheap resource, RAM, for an eternal maintenance and evolution cost that would only grow higher as the program grows. And remember that RAM isn't entirely fungible. It's hard to get less than 1GB per core (either in bare metal or in cloud VMs/containers) so using less RAM often saves you $0.
> but for a smaller app, that could actually be a worthy tradeoff, no? Like...a desktop app?
I said that low-level languages can offer good performance in small programs, but many desktop apps aren't small. Claude Code's CLI is over 500KLOC.
> So all we have is anecdotal experience that Java is more performant for large, long lived processes with many contributors
If anything, benchmarks are much more anecdotal. Not only are there fewer benchmarks than applications, but they don't even resemble real programs. But yeah, ever since operations lost their intrinsic costs some 20 years ago - with CPU cache hierarchies, branch prediction, and ILP, more powerful optimising compilers, and more elaborate GCs/memory allocators, the ability to generalise from one program to another is close to nil. So yeah, experience is all we have to go with. Going with the numbers we have (some benchmarks that don't extrapolate) rather than the numbers we need but don't have doesn't help.
I can tell you that the loss of intrinsic operation costs has made our lives as compiler/runtime developers much harder, because we can no longer tell people that this operation is generally fast or generally slow. But that doesn't change the fact that this is our reality.
> what the original claim was, which is the claim you cannot reduce RAM consumption without harming CPU utilization
No. I wrote, and I quote: "Using a lot less RAM often implies using more CPU."
> That's a really strong claim considering we have painted a much narrower scope of when that's true.
Did we? Most of software (measured by the distribution of paid programmers) is in large applications.
> Most of us aren't working on 3M LOC faang apps. We are working on smaller things, or desktop apps
I don't think that's true at all. Forget FAANG. Most software isn't written by software companies at all, but is in-house software (well, Netflix isn't a software company, so I guess it's one FAANG letter). The bulk of software is in things like telecom management and billing, banking and finance, manufacturing control, logistics and shipping, healthcare and hospitality, retail and payment processing, travel, government, defence. 3MLOC is quite typical. People who work on smaller software are overrepresented in Silicon Valley and, I'm guessing, among HN readers, but they're the outliers.
> Grandma doesn't have the vocabulary to complain about RAM, true. But her computer is slow, she asked her grandson for help, and her grandson told her to use Spotify in the browser, not download the app. And now she has to be mindful of what she has open, even though 8gb of RAM is actually a lot, we've just lost sight of it.
Does she, though? I doubt she's running anything intensive in the background, so she's really only using one program at a time, and SSDs are fast enough these days to page in virtual memory when she switches programs, unless the one program she's currently using eats up the 8GB. I agree that if her OS - the one thing she needs to run in the background - is taking up a lot of RAM that could be a problem, but the OS is special. Her computer is slow not because shes using a program that eats up a lot of RAM, but because she's inadvertently running a lot of stuff in the background that shouldn't be running at all (browser plugins? some programs that add themselves as login items?) A Surface Laptop comes with 16GB of RAM. No single program uses even half of that.
> The problem is CPU and RAM usage are fundamentally different. I don't get to know what will be run with my program, so I don't get to know how restricted RAM is.
You'd think that, but that's not the case. I admit that I only recently started thinking deeply about this, thanks to some conversations with a colleague who's one of the world's leading experts on memory management, and it was so eye-opening that I gave a talk about this at the recent Java One (because my colleague wasn't available). There are two sides to this:
1. On the demand side, the key is that the use of RAM necessitates the use of CPU (and vice versa): writing and reading to/from RAM requires CPU, but also we write to RAM only when we expect the program to read it in the future. This means that any CPU we use, takes away the ability of another program to use some RAM (because using RAM requires CPU). To give the basic intuition for this, I mentioned the extrme example of a program that uses 100% of CPU. Such a program effectively captures 100% of RAM no matter how much of it it actually uses because no other program can use any RAM, as no other program has the CPU available to access it. You don't need to know anything about what other programs do. Another way to think about this is that the machine is spent whenever the first of RAM and CPU is exhausted.
2. On the supply side, RAM and CPU - whether in metal or in virtualised hardware - are effectively sold as a package (it's hard to get less than 1GB per core, except on embedded devices these days). Furthermore, both moving GCs and (to a far lesser extent) memory allocators can trade RAM and CPU (sophisticated memory allocators aren't quick to return RAM to the OS and maintain internal buffers).
So even though it is true that different programs may have different CPU/RAM usage patterns, you have to think about the ratio rather than CPU and RAM in isolation, and try to achieve some approximate balance. To put it simply, if a program uses a lot of CPU it doesn't make sense for it to use little RAM, because by using a lot of CPU it is effectively depriving other programs of their ability to use RAM (as that requires CPU). There are some exceptions, such as large caches, but the tradeoffs there are very different and too complicated to go into here (I did cover that in my talk).
> It's purely a dev experience decision because people don't know how to ship desktop apps. It's pure waste for the user.
No. I mean, some of it is probably waste, but:
1. What you call "dev experience" also affects the user because it directly impacts the cost of software. Users want cheaper software.
2. More relevant to this particular discussion is what else the user could do. Having "more programs open" isn't a problem thanks to SSDs and virtual memory. So we're talking about programs that are actively using the CPU for something, and they, too, need a balance of the RAM/CPU ratio.
I'm not trying to be dogmatic in the other direction and assert that Electron is necessarily the best tradeoff. But I'm saying that efficiency is ultimately about money that is spent on a combination of RAM, CPU, and software, and when you look at the full picture you see that it's more complicated than it seems. It's not that the software industry has decided to waste users' money. If it did, there would be a competitive edge to programs that use less RAM, but we don't see that competitive edge. What we do see is a few people on HN saying how they simply can't live with VS Code's 50ms keystroke latency and how amazing is some other editor with only 20ms latency that's likely to go out of business soon [1]. The people who made these decisions aren't some early-career developers who just like hot code reloading or some such.
[1] Yes, I do think Rust is more hardware-efficient than JS, but here I'm looking at an even bigger picture. And yes, if you rewrite from JS to C++/Java/Rust/Go you can win on hardware, but as I said at the very beginning, any such rewrite is not really "an optimisation".
> To put it simply, if a program uses a lot of CPU it doesn't make sense for it to use little RAM
The fallacy in that reasoning is that a program that's using a lot of CPU (especially if it's a huge MLOC-sized app) is most likely using up its CPU on memory throughput, not pure number-crunching compute! So at least for the enterprise app case (not pure number crunching), you'd actually need a tunable tradeoff between memory throughput and total RAM footprint, and adopting copying/moving GC's just doesn't give you that. Collections cycles are a huge burden on memory throughput: thus, indirectly, on the very thing you're calling "CPU". The theoretical prospect of winning by forgoing collections cycles outright (pure bump arena allocation) is explicitly excluded here since we're talking about long-running programs that will at some point need to garbage collect.
Heap allocation may have marginally higher "CPU" use in the pure compute sense, but that's exactly the kind of CPU use that does trade off successfully with a lower RAM footprint.
Similarly, non-moving concurrent garbage collectors like Go's also successfully navigate this tradeoff compared to moving/copying collectors, because their collection work, while compute- and to some extent memory-traffic intensive (though less so than if copying/moving memory was involved!) can be largely (though not completely - some minor compute overhead on the hot path is still present) shunted off to a lower-priority background thread.
On the other side of the tradeoff, arenas and caches increase memory footprint in a way that's low-impact on memory throughput (unlike pervasive use of a copying/moving GC) because only live data is accessed as needed, and deallocating the arena is a single operation. The tradeoff is actually highly favorable to low-level languages, which commonly use arenas to manage challenges with heap allocation such as fragmentation.
> The fallacy in that reasoning is that a program that's using a lot of CPU (especially if it's a huge MLOC-sized app) is most likely using up its CPU on memory throughput, not pure number-crunching compute!
No, there's no such fallacy here because that assumption is not needed for the conclusion. The point is that CPU is needed to use RAM, and so if you use CPU for whatever reason - even to loop for an hour over some integers - you are consuming a resource that is needed to use RAM (by another program). So the use of CPU "captures" RAM whether it uses RAM or not, so it might as well use it.
The extreme example I gave was that a program that uses 100% CPU (again, even if it uses zero RAM) effectively capures 100% of RAM because no other program can use any RAM while that program is running. This extreme example is just to build some intuition, but it scales to lower CPU utilisations.
> Collections cycles are a huge burden on memory throughput
I don't even know where to start. The whole point of moving collectors is that the can make the cost of memory management arbitrarily low, reduing the overhead compared to free list approaches. A collection cycle does a constant amount of work (per program & workload), but the frequency of the collections can be made arbitrarily low. This is memory management 101.
The problem of moving collectors has traditionally been the impact on latency, not on throughput (they were always better on throughput than free lists) - until the advent of pauseless moving collectors.
> Heap allocation may have marginally higher "CPU" use in the pure compute sense, but that's exactly the kind of CPU use that does trade off successfully with a lower RAM footprint.
Except it doesn't given the actual economics of RAM and CPU. You'll need to wait for my talk to be posted to YouTube (I can't reproduce it all here), but in the meantime you can watch this one, by my colleague, which was a keynote at the most recent ISMM (International Symposium on Memory Management): https://youtu.be/mLNFVNXbw7I
The problem is that Erik is one of the world's leading experts on memory management, and he's talking to other experts, so his talk assumes quite a bit of familiarity with the subject. Also, his comparison focuses on tracing GCs, leaving the one to malloc/free implicit.
> Similarly, non-moving concurrent garbage collectors like Go's also successfully navigate this tradeoff compared to moving/copying collectors
Again, except they do not. Go users experience severe problems with the GC that Java users no longer do precisely because of the inefficiencies of their simple GC (the JDK used to have such a GC, but we removed it five years ago when newer, more sophisticated algorithms yielded better results.
> On the other side of the tradeoff, arenas and caches increase memory footprint in a way that's low-impact on memory throughput (unlike pervasive use of a copying/moving GC)
This is simply not true, and shows unfamiliarity with how modern moving collectors actually work (remember that the first open-source high throughput, paseuless moving collector was first released two and a half years ago). Moving collectors offer pretty much the same tradeoff as arenas. The key points are:
1. A generational design makes copying a relatively rare operation to begin with (only a relatively small number of objects are copied).
2. The frequency of collections can be made arbitrarily low.
If the usage happens to be arena-like, i.e. no objects survive, nothing is copied (unrelated long-lived objects are already in the old gen, and because the old gen is untouched, there's no need to compact anything there).
The reasons for not using moving collectors have nothing to do with throughput:
1. Latency used to suffer. Low-latency moving collectors were a very advanced technology. The first open-source one is younger than ChatGPT.
2. Moving collectors impact the design of FFI (with C, etc.), as C (etc.) does not support moving pointers for reasons having nothing to do with performance. It's very hard for languages that want a very direct and simple FFI (as FFI is very common in code) have a hard time implementing efficient moving collectors.
3. Good moving collectors require large expert teams (let alone pauseless moving collectors). The languages that have them (to varying degrees of sophistication and performance) are well funded ones. In particular, they are the JDK team, the .NET team, and the V8 team. Good allocators (for malloc/free) are also big and sophisticated beasts these days, but they're much easier to reuse in different languages (i.e. not only by C/C++/Zig/Rust, but also by Python). Effectively, large pieces of those languages' runtime is "offshored" to unrelated specialist teams.
> The tradeoff is actually highly favorable to low-level languages, which commonly use arenas to manage challenges with heap allocation such as fragmentation.
This is also not true (and I say this because I'm primarily a low-level programmer, and have been doing low-level programming for over 25 years). First, C++ and Rust in particular make arenas hard to use to their full power (which is one of the several reasons low-level programmers prefer Zig). Second, if you're not familiar with the severe costs of memory management in low level languages, I can only conclude you haven't been doing it for very long.
Java was designed, among other things, to reduce the severe and hard-to-fix performance problems that many C++ programs had experienced (and do to this day). The hard problems concern both the limitations of AOT compilation and of free-list-based memory management. I'm not saying all C++ programs suffer from these issues, but a huge class of them do, which is one of the reasons large programs have migrated to Java.
I feel like this has been a great discussion but all the good technical talk is tapering off. There's more rhetorical semantics now than anything. But I appreciate you teaching me. I'm a bit tired in this reply.
I wish I could find you a few reports from people on here basically renouncing Java because they could not optimize it any further after 20 years programming in it, and moving to Rust. I'd be curious what you'd think.
> You say "dev experience" as if it's some quality-of-life thing. They traded off a cheap resource, RAM, for an eternal maintenance and evolution cost that would only grow higher as the program grows
No, I say dev experience because I brought it up earlier, and staked my claim on it. Because it's an umbrella term that covers nice-to-haves and how ergonomic the language and ecosystem are. The antithesis would be lots of repetitive plumbing that slows down feature release. It's one part of the triangle. They now are shipping way behind but wound up with a product that is just as fast and uses less RAM. That kind of decision can matter to other projects. Smaller projects likely wouldn't have such a slow turnaround.
Again, we're not advocating for everyone to do rewrites. Threads like this are people begging app developers to stop using stuff not appropriate for desktop apps.
> I said that low-level languages can offer good performance in small programs, but many desktop apps aren't small. Claude Code's CLI is over
Err, Claude Code is very special, yes. Couldn't tell you why that tool needs to use that much, but most desktop apps don't. They are built on vendor code and keep the actual app code small, and that makes them excellent candidates for what we're talking about.
> 3MLOC is quite typical. People who work on smaller software are overrepresented in Silicon Valley and, I'm guessing, among HN readers, but they're the outliers
I don't agree, unless you have some stats I don't know about. I mean I'm really jealous that you even know someone that worked on an app of that size. Most people are coding for one of the millions of mid sized businesses dotted all over the country. They are on 20 year old code bases that make great revenue. Everyone there is nice and meets with you weekly. The devs answer to the clients directly. They don't really need to grow their business endlessly, but there's lots of maintenance to do. I've worked with Healthcare companies, they were certainly not 3M lines of code. What you're describing is an extremely narrow class of software that most people will never touch. But I think it sounds cool.
> If anything, benchmarks are much more anecdotal
Not really. I concede they only measure what they do, and it might not be much, but at least they measure something, and it's public and reproducible. Anecdotes are vague stories that are impossible to evaluate. I had an anecdote of someone saying they can't use Java anymore because, even with 20 years of experience, they cannot optimize Java any further for what they need. They rewrote it in Rust. It works much better now, and it's not even close. What am I to make of that?
> Her computer is slow not because shes using a program that eats up a lot of RAM, but because she's inadvertently running a lot of stuff in the background that shouldn't be running at all
There are absolutely apps that run 8gb of RAM. And pageswaps are not good, even with an SSD. They're a real problem.
I just find it lame to tell people this when we could ship leaner apps and it wouldn't even be that hard. It's 2026 and people should be able to have whatever open windows they want. Even 8gb of Ram is a lot, we've just forgotten about it. Shit, my web browser uses 4GB ram idle.
> So even though it is true that different programs may have different CPU/RAM usage patterns, you have to think about the ratio rather than CPU and RAM in isolation, and try to achieve some approximate balance. To put it simply, if a program uses a lot of CPU it doesn't make sense for it to use little RAM, because by using a lot of CPU it is effectively depriving other programs of their ability to use RAM (as that requires CPU). There are some exceptions, such as large caches, but the tradeoffs there are very different and too complicated to go into here (I did cover that in my talk).
That's a cool realization. But I think it's slippery. Only in extreme scenarios will your CPU actually block RAM. The 100% usage scenario makes sense. But most of the time, your CPU is going to be underutilized and capable of letting every app use RAM freely. Obviously the more direct problem would be someone's RAM was sucked up by different apps.
> It's not that the software industry has decided to waste users' money. If it did, there would be a competitive edge to programs that use less RAM, but we don't see that competitive edge. What we do see is a few people on HN saying how they simply can't live with VS Code's 50ms keystroke latency and how amazing is some other editor with only 20ms latency that's likely to go out of business soon
How? If you're forced to use work software, you have no competition to go to. Same for your music app, your social network, your team's chat tool. The "competitive edge" argument requires users to actually have a choice, and for most desktop software they don't they use what their employer, school, or social network has standardized on. Where users do have free choice, they gravitate toward leaner options constantly. Sublime kept paying customers against free Electron alternatives. Mobile platforms enforce resource discipline and have no Electron equivalent. The competitive edge for leanness exists; it just can't express itself when ecosystem effects lock users in.
So I still feel strongly there are cases where you could make sufficiently small programs in Rust that wouldn't devolve into spaghetti. That would give you great performance and ram usage. I still want to find the example of my anecdote, but I'm tired.
I feel like this has been a great discussion but all the good technical talk is tapering off. There's more rhetorical semantics now than anything. But I appreciate you teaching me. I'm a bit tired in this reply.
I wish I could find you a few reports from people on here basically renouncing Java because they could not optimize it any further after 20 years programming in it, and moving to Rust. I'd be curious what you'd think.
> You say "dev experience" as if it's some quality-of-life thing. They traded off a cheap resource, RAM, for an eternal maintenance and evolution cost that would only grow higher as the program grows
No, I say dev experience because I brought it up earlier, and staked my claim on it. Because it's an umbrella term that covers nice-to-haves and how ergonomic the language and ecosystem are. The antithesis would be lots of repetitive plumbing that slows down feature release. It's one part of the triangle. They now are shipping way behind but wound up with a product that is just as fast and uses less RAM.
... and will cost much more to evolve, costs will never drop and may well rise. These costs are higher than the RAM they saved, which was free anyway, because it couldn't be used for anything else.
> That kind of decision can matter to other projects. Smaller projects likely wouldn't have such a slow turnaround.
Again, we're not advocating for everyone to do rewrites. Threads like this are people begging app developers to stop using stuff not appropriate for desktop apps.
The people begging are sometimes right and sometimes really wrong. They are sensitive to certain things but don't consider the full picture.
> I said that low-level languages can offer good performance in small programs, but many desktop apps aren't small. Claude Code's CLI is over
> Couldn't tell you why that tool needs to use that much, but most desktop apps don't. They are built on vendor code and keep the actual app code small, and that makes them excellent candidates for what we're talking about.
VS Code is something like 5MLOC. Slack is probably around a million. Every desktop app (that isn't bundled with the OS) that I or people I know use are roughly that size or bigger.
> What you're describing is an extremely narrow class of software that most people will never touch. But I think it sounds cool.
First of all, this is the class of software that most people rely on the most by far. It certainly contributes more economic value than other software. You use that software every time you tap your bank card; every time you place or receive a call or send or receive a text message; every time a package arrives at your doorstep; every time you watch any video on any platform; every time you receive medical treatment or stay at a hotel. Clearly, it's the class of software that contributes the majority of value that software delivers. I don't know exactly how many people work on such software. I think around 50% of developers at least, but even if I'm wrong, we're talking at least a few million developers.
> but at least they measure something, and it's public and reproducible.
There is zero value to a measurement that is not relevant to you, and negative value to making you think it's relevant ("well, it's not the number I need but it's some number I have so I'll go by that") when it's not.
> Anecdotes are vague stories that are impossible to evaluate.
You can evaluate them at least as well as you can benchmarks - by asking questions that will allow you to know if the information is relevant to your use case - only they at least have a chance of being relevant, while benchmarks really rarely are. I will just say that if you don't know how exactly a memory allocator is implemented (e.g. whether and how it degrades with time), it is absolutely impossible for you to evaluate a benchmark that purports to measure memory management. There is nothing you can learn from it because you don't really know what it is that's been measured.
> I had an anecdote of someone saying they can't use Java anymore because, even with 20 years of experience, they cannot optimize Java any further for what they need. They rewrote it in Rust. It works much better now, and it's not even close. What am I to make of that?
You are to make of it that, in the past 20 years at least, we have no ability to extrapolate from one program to another. Given that languages like C++, Rust, Java, Zig, and C# are all at the topmost performance category, it makes a lot of sense that in some situations X will be faster than Y and in others Y will be faster than X.
> There are absolutely apps that run 8gb of RAM. And pageswaps are not good, even with an SSD. They're a real problem.
I've not seen that in quite a few years now. Right now I have Chrome open on four streaming platforms. Just over 500MB. I have over 100 tabs open in Safari. About 1.2GB. I'm sure there are some apps that use 8GB of RAM, but these aren't apps that grandma uses or wants to use.
> That's a cool realization. But I think it's slippery. Only in extreme scenarios will your CPU actually block RAM. The 100% usage scenario makes sense.
No. This scales. Again, every CPU cycle you consume takes a cycle away from some other program and reduces its ability to use RAM (as that requires the cycle).
> But most of the time, your CPU is going to be underutilized and capable of letting every app use RAM freely.
No. Using RAM means using CPU. I you're idle, then you're not using RAM. Might as well be paged to SSD. You get zero points for keeping RAM significantly more plentiful than free RAM. You save $0.
> How? If you're forced to use work software, you have no competition to go to. Same for your music app, your social network, your team's chat tool. The "competitive edge" argument requires users to actually have a choice, and for most desktop software they don't they use what their employer, school, or social network has standardized on. Where users do have free choice, they gravitate toward leaner options constantly. Sublime kept paying customers against free Electron alternatives. Mobile platforms enforce resource discipline and have no Electron equivalent. The competitive edge for leanness exists; it just can't express itself when ecosystem effects lock users in.
The competitive edge does not require users to have a choice. It requires a real edge. If some software really makes you more productive, then your employer would be foolish not to buy it. If some software allows the school to significantly save on hardware - the same. The reason these products don't catch on is because they're written by hackers with certain sensitivities who do not understand the economics of their users' hardware and software.
> So I still feel strongly there are cases where you could make sufficiently small programs in Rust that wouldn't devolve into spaghetti. That would give you great performance and ram usage.
Sure. Like I said, low-level programs are fast and efficient for small programs. But even then, you need to look at the full picture to know how much, if any, money you're saving.
Thank you for walking me through the CPU cycle hogging RAM thing, that makes more sense. Obviously I have a lot to learn here. For the record, I don't have a problem with Java. And in rereading the whole conversation, I can see you stipulated for exactly what I'm arguing about early on
> The question isn't why apps use a lot of RAM, but what the effects of reducing it are. Redcuing memory consumption by a little can be cheap, but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices
That sums up what I've been getting at much better.
> and will cost much more to evolve, costs will never drop and may well rise. These costs are higher than the RAM they saved, which was free anyway, because it couldn't be used for anything else
I guess I'm suggesting If it evolves. My experience with smalller apps is spread across dozens of businesses that are 20+ years old. 1M LOC is not a required thing. In fact, for many businesses, you probably couldn't reach that many LOC without inventing busy work for devs. Sometimes my job is ripping features OUT that a different dev company put in, and nobody knows why anymore.
> The people begging are sometimes right and sometimes really wrong. They are sensitive to certain things but don't consider the full picture
That's a true statement in isolation. I think it's reasonable to not want a web browser embedded in multiple apps on your computer. Slack and Spotify use more RAM than Steam. For what each app does, that seems absurd to me. Again, that's not a bad tradeoff from a development velocity perspective.
> First of all, this is the class of software that most people rely on the most by far. It certainly contributes more economic value than other software.
But the fact that this type of software has more devs is different from saying the average project has the same considerations. I wouldn't tell someone to use Kubernetes because FAANG uses it, and that means a lot of devs use it. If you estimate 50% of developers to be working on this kind of software, I estimate 5% of them has any choice over what tech stack they are using in the first place. So when you are making tech stack recommendations and saying "C++ is not fast for apps", you are talking to the other 50%.
> There is nothing you can learn from it because you don't really know what it is that's been measured
That's true. I downloaded the benchmarks and ran them myself and played around with them. But I lean on others for technical evaluation. My understanding of low level programming ends at toy projects and what I've read about cache/cpu. I imagine if you develop the JVM it's frustrating to continuously talk to people about isolated benchmarks.
Edited out a lot of rhetorical arguing after rereading the conversation
> If it evolves. My inexperience with apps that large is spread across dozens of businesses that are 20+ years old. 1M LOC is not a required thing.
Just to be clear, the cost of maintaining a program in a low-level language is always higher. That's easily the #1 reason the use of low-level languages has been declining steadily for a few decades now with no hint of a change in direction. What happens in large programs is that low-level languages become slow. So yes, if that program doesn't grow, it will probably not become slow, but they've already paid more on development and continue to pay more on maintenance than any savings they could have made on memory, which are probably zero or nearly that.
The point of my explanation about the RAM/CPU relationship is that a well-balanced ratio is free. If your CPU usage amounts to some X% of RAM "captured" any memory savings below it translates to $0 in savings. It's sort of like ink and paper. They're used in combination so reducing the consumption of one without the other doesn't really save you anything.
> I think it's reasonable to not want a web browser embedded in multiple apps on your computer.
I don't know why that would be reasonable unless you can show me it's a waste of money. Maybe it is, but I'm not sure.
> Slack and Spotify use more RAM than Steam. For what each app does, that seems absurd to me.
But software is written to deliver value to users. Most software has no intrinsic value. Sometimes in an economy you get things that may seem absurd - I can't think of a good example, but say that you can only buy rope in units of 1m - but make sense once you consider the entire system. Could Slack use much less RAM than Steam? Of course! Should it, though? I don't know.
> Again, that's not a bad tradeoff from a development velocity perspective.
And again, what you call "development velocity" is not some vanity metric, but something that can translate to actual money savings for the user more than reducing RAM consumption.
> Do you disagree that making the equivalent app in Avalonia, JavaFX, QT would likely use less RAM and CPU than Electron? Is there not room to trip RAM usage in the Desktop world without harming the CPU?
There probably is, but as I said in the beginning, switching a language is a large investment, not exactly an optimisation (and it might not be worth it).
> But the fact that this type of software has more devs is different from saying the average project has the same considerations.
Well, that depends what you mean by "the average project". If we're counting by number of programs/repos, the median project size may well be a 100 line script. We have to weigh it by something. Number of devs and lines of code are probably highly correlated, so either one would do.
> I estimate 5% of them has any choice over what tech stack they are using in the first place
I don't understand the point you're trying to make. I don't really care what someone working on some small website does because getting that tech stack wrong is of little consequence anyway. For software that "matters", the choice of tech also matters, and you're right that the junior developers (and probably many senior developers) working on those projects don't choose the tech, but somebody does, and these are the choices that matter. For example, you care about Slack's tech choice. That was also some high-level decision. If they got it wrong, it wasn't their junior programmers who made the mistake.
> I imagine if you develop the JVM it's frustrating to continuously talk to people about isolated benchmarks.
Yes, but everyone who deals with software performance has been frustrated by this for a long time. Benchmarks used to be at least somewhat more informative until the late '90s. I don't know how to educate developers more about this, but I hope someone manages to do it.
> Well that is a very different statement from what you said earlier, which is "C++ and Rust are simply not particularly fast for applications, and Java is." You have been painting a picture that it is essentially impossible to top Java with Rust except in the narrowest of situations.
It is generally hard to beat Java in large programs. It is always theoretically possible because you can view every Java program as a C++ program (which is what the HotSpot JVM is) running on some data, but it's hard, and I would say close to impossible for similar costs.
Only if the software is optimised for either in the first place.
Ton of software out there where optimisation of both memory and cpu has been pushed to the side because development hours is more costly than a bit of extra resource usage.
The tradeoff has almost exclusively been development time vs resource efficiency. Very few devs are graced with enough time to optimize something to the point of dealing with theoretical tradeoff balances of near optimal implementations.
That's fine, but I was responding to a comment that said that RAM prices would put pressure to optimise footprint. Optimising footprint could often lead to wasting more CPU, even if your starting point was optimising for neither.
My response was that I disagree with this conclusion that something like "pressure to optimize RAM implies another hardware tradeoff" is the primary thing which will give, not that I'm changing the premise.
Pressure to optimize can more often imply just setting aside work to make the program be nearer to being limited by algorithmic bounds rather than doing what was quickest to implement and not caring about any of it. Having the same amount of time, replacing bloated abstractions with something more lightweight overall usually nets more memory gains than trying to tune something heavy to use less RAM at the expense of more CPU.
Some of the algorithms are built deep into the runtime. E.g. languages that rely on malloc/free allocators (which require maintaining free lists) are making a pretty significnant tradoff of wasting CPU to save on RAM as opposed to languages using moving collectors.
GC burns far more CPU cycles. Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM. Most tasks stall on IO. Those that don't typically stall on either memory bandwidth or latency. Meanwhile CPU bound tasks typically don't perform allocations and if forced avoid the heap like the plague.
Far less for moving collectors. That's why they're used: to reduce the overhead of malloc/free based memory management. The whole point of moving collectors is that they can make the CPU cost of memory management arbitrarily low, even lower than stack allocation. In practice it's more complicated, but the principle stands.
The reason some programs "avoid the heap like the plague" is because their memory management is CPU-inefficient (as in the case of malloc/free allocators).
> Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM
There is a fundamental relationship between CPU and RAM. As we learn in basic complexity theory, the power of what can be computed depends on how much memory an algorithm can use. On the flip side, using memory and managing memory requires CPU.
To get the most basic intuition, let's look at an extreme example. Consider a machine with 1 GB of free RAM and two programs that compute the same thing and consume 100% CPU for their duration. One uses 80MB of RAM and runs for 100s; the other uses 800MB of RAM and runs for 99s (perhaps thanks to a moving collector). Which is more efficient? It may seem that we need to compare the value of 1% CPU reduction vs a 10x increase in RAM consumption, but that's not necessary. The second program is more efficient. Why? Because when a program consumes 100% of the CPU, no other program can make use of any RAM, and so both programs effectively capture all 1GB, only the second program captures it for one second less.
This scales even to cases when the CPU consumption is less than 100% CPU, as the important thing to realise is that the two resources are coupled. The thing that needs to be optimised isn't CPU and RAM separately, but the RAM/CPU ratio. A program can be less efficient by using too little RAM if using more RAM can reduce its CPU consumption to get the right ratio (e.g. by using a moving collector) and vice versa.
There are (at least) two glaring issues with your analysis. First, the vast majority of workloads don't block on CPU (as I previously pointed out) and when they do they almost never do heap allocations in the hot path (again, as I previously pointed out). Second, we don't use single core single thread machines these days. Most workloads block on IO or memory access; the CPU pipeline is out of order and we have SMT for precisely this reason.
Anyway I'm not at all inclined to blindly believe your claim that malloc/free is particularly expensive relative to various GC algorithms. At present I believe the opposite (that malloc/free is quite cheap) but I'm open to the possibility that I'm misinformed about that. You're going to need to link to reputable benchmarks if you expect me to accept the efficiency claim, but even then that wouldn't convince me that any extra CPU cycles were actually an issue for the reasons articulated in the preceding paragraph.
> There are (at least) two glaring issues with your analysis. First, the vast majority of workloads don't block on CPU (as I previously pointed out) and when they do they almost never do heap allocations in the hot path (again, as I previously pointed out). Second, we don't use single core single thread machines these days. Most workloads block on IO or memory access; the CPU pipeline is out of order and we have SMT for precisely this reason.
This doesn't matter because if you're running a single program on a machine, it might as well use all the CPU and all the RAM. As long as you're under 100% on both, you're good. But we want to utilise the hardware well because we typically want to run multiple programs (or VMs) on a single machine, and the machine is exhausted when the first of CPU or RAM is exhausted. So the question is how should your CPU and RAM usage be balanced to offer optimal utilisation given that the machine is spent when the first of CPU and RAM is spent. E.g. you can only run two programs, each using 50% of CPU; if they each use only 5% of RAM, you've saved nothing as no third program can run. So if you spend either one of these resources in an unbalanced way, you're not using your hardware optimally. Using 2% more CPU to save 200MB of RAM could be suboptimal.
I'm not saying that for every program that uses X% CPU should also use exactly X% of RAM or it must be wasting one or the other, but that's the general perspective of how to think about efficiency. Using a lot of one and little of the other is, broadly speaking, not very efficient.
> Anyway I'm not at all inclined to blindly believe your claim that malloc/free is particularly expensive relative to various GC algorithms. At present I believe the opposite (that malloc/free is quite cheap) but I'm open to the possibility that I'm misinformed about that.
You are.
> You're going to need to link to reputable benchmarks if you expect me to accept the efficiency claim, but even then that wouldn't convince me that any extra CPU cycles were actually an issue for the reasons articulated in the preceding paragraph.
I don't believe there are any reputable benchmarks of full applications (which is where memory-management matters) that are apples-to-apples. I'm speaking from over two decades of experience with C++ and Java.
The important property of moving collectors is that they give you a knob that allows you to turn RAM into CPU and vice-versa (to some extent), and that's what you want to achieve the efficient balance.
Moving collectors as generally used are a huge waste of memory throughput, and this shows up consistently in the performance measurements. Moving data is very expensive! The whole point of ownership tracking in programming languages is so that large chunks of "owned" data can just stay put until freed, and only the owning handle (which is tiny) needs to move around. Most GC programming languages do a terrible job of supporting that pattern.
That's just not true. To give you a few pieces of the picture, moving collectors move little memory and do so rarely (relative to the allocation rate):
In the young generation, few objects survive and so few are moved (the very few that survive longer are moved into the old gen); in the old generation, most objects survive, but the allocation rate is so low that moving them is rare (although the memory management technique in the old gen doesn't matter as much precisely because the allocation rate is so low, so whether you want a moving algorithm or not in the old gen is less about speed and more about other concerns).
On top of that, the general principle of moving collectors (and why in theory they're cheaper than stack allocation) is that the cost of the overall work of moving memory is roughly constant for a specific workload, but its frequency can be made as low as you want by using more RAM.
The reason moving collectors are used in the first place is to reduce the high overhead of malloc/free allocators.
Anyway, the general point I was making above is that a machine is exhausted not when both CPU and RAM are exhausted, but when one of them is. Efficient hardware utilisation is when the program strikes some good balance between them. There's not much point to reducing RAM footprint when CPU utilisation is high or reducing CPU consumption when RAM consumption is high. Using much of one and little of the other is wasteful when you can reduce the higher one by increasing the other. Moving collectors give you a convenient knob to do that: if a program consumes a lot of CPU and little RAM, you can increase the heap and turn some RAM into CPU and vice versa.
reply