Oh... thank you for the reminder to try running the C version of this exploit on an Android phone over adb. The curiosity is now killing me.
Edit: for context, I work in embedded and the aarch64 version (PR #42 in the repo) has successfully popped every device I've tried it against except one where I have a custom kernel to work around a driver issue and (looking back at my git logs) accidentally forgot to enable the user-mode API for alg_aead specifically. Lucky mistake.
I don’t know how many difference little models this uses under the hood, but I was shocked at how good it was at the couple document extraction tasks I threw it at.
There’s an interesting thing there that I believe varies person to person. My understanding is that some people do think in a more symbolic/heuristic way, some rely very heavily on their inner monologue to make sense of things (I am in the latter camp, and only have a single core language processor so pretty much cannot come up with coherent thoughts if I’m concentrating on what someone else is saying)
Even more interesting, and getting off on a bit of a tangent, there is also a mode that I use for revealing emotions that I don’t have words for (alexythmia): I open up a text editor, stare off into space, and let my fingers type without “observing” the stream of words coming out. I then go back and read what I “wrote” and often end up understanding how I’m feeling much better than I did. It’s weird.
Edit: also, playing with local models through e.g. llama-cpp in “thinking mode” is super fascinating for me. The “thought process” that comes out before the real answer often feels pretty familiar when I reflect on my own inner monologue, although sometimes it’s frustrating for me because I see where their “thinking” went off the rails and want to correct it.
It's funny... reading this thread, I'm reminded of a friend of mine who indeed gets migraines from tomatoes. That was actually what she figured out first; the MSG connection came later.
> I bought a 2 in 1 and the experience is much better, simply because i can detach the keyboard and use it as a massive tablet. its not as fluid as an ipad, but most of the time its simply mildly annoying to get to the app/browser i want, then I scroll and tap the same way I would on an Ipad. On my regular touchscreen laptop, I have to lift my fingers to use the interface, which simply adds delay for... the ability to scroll a pdf, afaik.
I have a work Lenovo Yoga and have a similar experience with the 2-in-1. I actually appreciate that I can fold the keyboard all the way back under and use it as a tablet. I'll sometimes use that for doing document reviews on the couch. I've also used it folded, hmmm, 290 degrees or so as a touch interface for some monitoring software. Windows seems to have some APIs that will report to applications when it switches into tablet mode and applications that auto-switch their UI to have bigger buttons etc are quite appreciated.
I'll throw this out as something where it has saved literally weeks of work: debugging pathological behaviour in third-party code. Prompt example: "Today, when I did U, V, and W. I ended up with X happening. I fixed it by doing Y. The second time I tried, Z happened instead (which was the expected behaviour). Can you work out a plausible explanation for why X happened the first time and why Y fixed it? Please keep track of the specific lines of code where the behaviour difference shows up."
This is in a real-time stateful system, not a system where I'd necessarily expect the exact same thing to happen every time. I just wanted to understand why it behaved differently because there wasn't any obvious reason, to me, why it would.
The explanation it came back with was pretty wild. It essentially boiled down to a module not being adequately initialized before it was used the first time and then it maintained its state from then on out. The narrative touched a lot of code, and the source references it provided did an excellent job of walking me through the narrative. I independently validated the explanation using some telemetry data that the LLM didn't have access to. It was correct. This would have taken me a very long time to work out by hand.
Edit: I have done this multiple times and have been blown away each time.
This seems to be a common denominator for what LLMs actually do well: Finding bugs and explaining code. Anything about producing code is still a success to be seen.
> Prompt example: "Today, when I did U, V, and W. I ended up with X happening. I fixed it by doing Y. The second time I tried, Z happened instead (which was the expected behaviour). Can you work out a plausible explanation for why X happened the first time and why Y fixed it? Please keep track of the specific lines of code where the behaviour difference shows up."
> The explanation it came back with was pretty wild. It essentially boiled down to a module not being adequately initialized before it was used the first time and then it maintained its state from then on out.
Even without knowing any of the variable values, that explanation doesn't sound wild at all to me. It sounds in fact entirely plausible, and very much like what I'd expect the right answer to sound like.
The wild part, for me at the time, was how many steps there were from cause and effect and how perfectly they'd been reasoned through. The first time I had that experience was my first real "this LLM stuff might have some legs". My second similar experience several days later was "hmmm that wasn't a fluke..."
I'm still at a stage where I'm not completely sure that I like the code that Codex or Claude wants to write. Sometimes it's good, sometimes it takes 5 or 6 iterations to get it somewhere I'm happy with. But wow, on the front end of the work, they are great design/review/iterate partners; sometimes I let the tools write the first draft and then I find the gaps, sometimes I write the first draft and let the tools find the gaps. Either way has worked really well for making solid debt-free progress.
Just to contextualize this... https://lmcache.ai/kv_cache_calculator.html. They only have smaller open models, but for Qwen3-32B with 50k tokens it's coming up with 7.62GB for the KV cache. Imagining a 900k session with, say, Opus, I think it'd be pretty unreasonable to flush that to the client after being idle for an hour.
While I’m not at all surprised that they’re still running, I am a little surprised at how many Farm-all owners are on HN. Farm-all H owner checking in :)
Edit: for context, I work in embedded and the aarch64 version (PR #42 in the repo) has successfully popped every device I've tried it against except one where I have a custom kernel to work around a driver issue and (looking back at my git logs) accidentally forgot to enable the user-mode API for alg_aead specifically. Lucky mistake.
reply