When I was building my computing stack out of x86 machine code I noticed that even if my high level language only had signed numbers (I'm still pretty brainwashed by C, which leads to the conclusions of OP), I still needed the ISA's unsigned jumps to deal with addresses (which can have the MSB set). So my big "insight" was to name unsigned comparisons "address comparisons".
I really appreciate your comments in this thread adrian_b. Could you point me at a brief summary of how Ada (or Pascal?) non-negative ints work? What is a compile error, what is a guaranteed run-time error, etc.
My comment was a bit tongue in cheek. Obviously it is a hard problem. But in a profession where we work with machines that literally were made to crunch numbers, and where abstraction is something we deal with daily, why can’t we have a performant abstraction for doing arbitrary calculations? The answer is that to be performant it must be solved in hardware, which would cost more than the hardware we have.
So in fact it is not just telling me it’s a hard problem, it’s telling me that the cost-benefit is still not there. It’s like it’s just not a very important problem (in an economic sense). And that is what surprises me, given that computers were made to do arbitrary calculations.
I used to imagine for someone in construction a wall must be some really simple thing. But it's only simple after millennia of building walls. So I now have lots of grace and patience for humanity to figure out numbers in computers, whether integers or reals.
Your explanation is possibly the same just in different words. It's a hard problem and probably needs a whole lifetime. But it's in no single person's economic interest to devote to it the time it needs (not to mention the diverse skills required; once one has a solution one has to pitch it to the world). And so it will happen over a hundred lifetimes.
High bandwidth memory (HBM) can deliver TB/s of memory bandwidth and has completely shattered the memory wall for individual cores/compute elements. The only way for compute to keep up is going wide and parallel as seen in GPUs.
Despite this, massively increased memory bandwidth does not translate to material performance improvements on non-parallel compute tasks because few tasks are actually memory bandwidth bound, instead being memory latency bound.
The best known general solutions for improving memory latency are per-compute element memory caches. Unfortunately, this increases the complexity and size of your compute elements forcing you to reduce the number of compute elements, but a large number of compute elements is the only way to saturate HBM memory bandwidth.
To keep up the best known techniques are either algorithmically batch which allows you to go wide using vector/batch instructions or you go the GPU route with memory latency-hiding parallelism.
Well…. The reason there’s such a big mismatch is the memory controller. Something like 80-90% of the energy is spent moving data in and out because of the complex addressing. If you move compute into the RAM and instead shuttle instructions in and out, you might get a huge speed up. The challenge is when an instruction references some data over there - that may end up eliminating all the advantage. But people I believe are trying to commercialize this concept.
> If you move compute into the RAM and instead shuttle instructions in and out, you might get a huge speed up.
Isn't that just a per-compute cache/local memory? You're proposing a scaled-up variety of NUMA where every compute core has its local memory and going outside that will cost you more.
Correct, you can think of this like NUMA or a distributed system where you have compute colocated with storage. It’s a special purpose accelerator for very specific problems that have been optimized to take advantage of such an architecture.
It’s also not my proposal. The industry is exploring ways to cut down the energy requirements to do AI - 80-90% of the memory consumption is just moving memory back and forth across the memory controller. It has to read a row from a bank into a row buffer, access the specific cell being requested and then shuttle it over the bus to the compute and then write the data back to the cells. The current idea is to maybe do the processing on the entire row buffer but you could imagine scaling that up to do it at the bank level. The challenge is manufacturing complexity since DRAM is made different, heat from the ALU, etc.
Oh my knowledge is woefully out of date. But I believe the memory wall is a fact of life for the most part. Like many others, I nibbled around the edges of the constraint at massive cost in increased complexity. Outside of very specific exceptions the cure tends to be worse than the disease.
Cool! Also, carrying forward the post author's tongue-in-cheek humour... I see your lua and raise you ~350 lines of Bash (excludes HTML templates). Take a good hard look at this shite [0]. Have you seen anything that is more obviously HTML-about-to-be-expanded-into-a-full-page?
That is... a cool trick! I don't know C, so it would have never occurred to me. (Let's be honest; it would have not occurred to me even if I knew C :))
What I like about the Bash + Heredocs + substitutions approach is that it looks and feels declarative to me. Template "expansion" is just in-place substitutions (parameter expansion, shell substitution, process substitution); no string munging needed to "place" or "inject" content in the HTML tree.
Anyway, I spent way too much time figuring that out, and it works well for me, so I'm going to just roll with the sunk costs :D
I buy "licenses" to read DRM'd ebooks all the time. Even though I disagree with DRM, I still want to reward the author/editor/illustrator/translator/etc for their artistry.
As for whether I leave the books in their DRM'd state or not? No comment :)
I thought this too for a long time. But I have since raised my estimate of the harms of DRM, and lowered my estimate of how much entertainment these artifacts provide. There is enough to read now without putting up with abusive relationships.
How many bits per pixel are you assuming, and are you imagining the red pixels are vertical neighbors with corresponding red pixels above and below, etc.?
At 8 bpp the effect is only colors moving vertically, up and down. After a while digging into it I realize why: most colors are too dark to be visible, we only see the most significant bits in each channel. And when those bits influence left or right the consequences are not visible.
Maybe I should try 2bpp. Or some HSL where I can clamp L.
I would love to compare notes, reading through Mu has given me a lot to think about that I may be writing about soon.
I see in Mu a mirror of my own aspirations, GDSL is something I intend to take down to the metal once I have access to a computer where I can reach deeper than my Mac allows. Though the path from here to an OS is by no means a straight one.
Mu is what I would call a MIX layer, the real substance of a process which turns one line of code into the next. Arguably, a MIX is the core of what makes a program a program, and the work of Mu, like Lisp and others, is to elevate it high enough that it becomes the whole interface.
For a deeply curious mind the MIX is the only thread they wish to pull. Because the comprehension of things at their most fundamental level is fuel enough. Unfortunately, the majority of people are not nearly so curious, thus the dominance of languages like Python.
So what makes Python so much more appealing than direct control? Pure TAST. Sugar for the hungry, and grammar that lets you say everything while knowing nothing. Somewhere, between the sugar and the substance lives the real heart of what makes a tool, and that’s what I’ve been picking at from both angles.
I would be curious to see how these could be unified, a Python TAST, Rust or Haskell DRE (for type systems and borrow checking) and a Mu MIX underneath. Let the user be lured in by the promise of ease, look under the hood to see the entire compiler fitting in just under ten thousand lines, and burrow their way down to the MIX and fundamental understanding.
Good to see you kicking around and here in a thread about things small enough to think about! I've not seen any blog posts from you in a while Kartik but I come back to your lua musings from time to time.
You made my day with your kind words! I don't know if we've spoken before, but feel free to hit me up offline if you'd like to chat more about stuff like this.
https://akkartik.github.io/mu/html/mu_instructions.html
reply