More

vibe42 · 2026-05-20T14:02:49 1779285769

I'm using the pi-mono coding agent (open source, free) without any extensions and very simple prompts. The 3.6 27B model (BF16, 250k context) uses 67GB VRAM on an RTX PRO 9000.

It's very capable on almost any coding task I've thrown at it, and very good for easy-to-medium hard scripts, new code bases.

It struggles on some complex tasks in larger code bases, e.g. using to debug and fix bugs in llama.cpp it gets close to working code but often introduces errors. For such tasks its still very useful as a search/explore tool and drafting fixes.

vibe42 · 2026-05-18T16:31:51 1779121911

Something I've had good progress with using local models and simple open-source harnesses is to repeat, in a new context, simple verification prompts.

I'd run the following 5-10 times with one model, then again with a 2nd model.

"Verify the correctness and completeness of all security configs/rules in SETUP.md. Consider if anything is missing, and if anything is not needed. Do not modify any files; only write potential findings to report.txt"

"Verify all findings and claims in report.txt."

Replace "SETUP.md" with whatever you're working on.

It's both terrifying and incredible watching what the models get correct and what they get completely wrong.

However, after enough runs they tend to settle on a state they claim does not need any more edits. And that result is generally useful with much fewer errors/hallucinations compared to a single run.

mrshu · 2026-05-18T22:53:49 1779144829

I have also had positive experience with doing this multiple times via multiple model families, and then to recursively have the fixes reviewed too.

It's called review-anvil and does find significant amount of problems that might pop up:

https://github.com/mrshu/agent-skills/#review-anvil

knollimar · 2026-05-18T22:29:18 1779143358

Don't you think "consider if anything is missing" leads them into adding something with sycophancy RL training and "if anything is not needed" making it remove something?

Or does "verify all claims in report" counteract that?

vibe42 · 2026-05-19T14:03:16 1779199396

It can indeed cause some models to try too hard to come up stuff, but the next verification prompt does counteract it.

E.g. some findings first classified as moderate priority often get reclassified as low priority even if the finding itself is correct.

The exact phrasing doesn't seem to matter as much as keeping the prompts short, simple and to the point.

However some models seem to do a bit better when adding ", if any" to prompts such as "List potential improvements".

vibe42 · 2026-05-14T15:17:07 1778771827

Many crypto wallets use a key derivation function (KDF) to add an amount of computation (and memory usage) per password tried - to mitigate brute force of weak passwords.

The increase in compute (decrease in brute-force cost) combined with price increases in many crypto tokens means brute-forcing old wallets can become worth it years after passwords were forgotten.

And of course even smaller, local AI models can now easily write optimized scripts to brute-force any given KDF function.

ndr · 2026-05-14T15:48:08 1778773688

how can that possibly work while supporting offline backup & restore?

_ache_ · 2026-05-14T17:12:16 1778778736

The compute power needed use to be of the order of 5s per password try. So it effectively mitigate brute force back them, you need a absurd compute power to crack them.

Moore law did its thing, now you can do it with a lot less computer power.

IIsi50MHz · 2026-05-20T02:55:21 1779245721

> Moore law did its thing, now you can do it with a lot less computer power.

s/power/time/ maybe? Or on second thought: so energy-efficicient that it actually uses less power in the same-or-shorter time… which brings me back to "less compute power".

vibe42 · 2026-04-24T13:23:31 1777037011

I run both MoE and dense models on laptops.

One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.

The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).

vibe42 · 2026-04-24T02:16:03 1776996963

https://old.reddit.com/r/LocalLLaMA/

Bit of a hype madhouse whenever a new model is released, but it's pretty easy to filter out simple hype from people showing reproducible experiments, specific configs for llama.cpp, github links etc.

vibe42 · 2026-04-23T18:37:16 1776969436

Outside Trading.

vibe42 · 2026-04-22T19:57:19 1776887839

This. And when possible, first asking the AI to add more granular logging around the code where the problem is - then re-run the code and feed the new log in a new context.

I've used this to debug some moderately complex bugs in golang and godot code and it works really well - the combo of having a new context with the (sometimes overly) granular debug logging and only the required, specific source code.

vibe42 · 2026-04-22T19:54:17 1776887657

Keep it simple and run a fresh, new context for each prompt.

I use the pi-mono coding agent with several different new open models running locally.

The simpler and more precise the prompt the better it works. Some examples:

"Review all golang code files in this folder. Look for refactor opportunities that make the code simpler, shorter, easier to understand and easier to maintain, while not changing the logic, correctness or functionality of the code. Do not modify any code; only describe potential refactor changes."

After it lists a bunch of potential changes, it's then enough to write "Implement finding 4. XYZ" and sometimes add "Do not make any other changes" to keep the resulting agent actions focused.

vibe42 · 2026-04-22T19:40:33 1776886833

With the pi-mono coding agent (running local, open models) this works very well:

"Do not modify any code; only describe potential changes."

I often add it to the end when prompting to e.g. review code for potential optimizations or refactor changes.

vibe42 · 2026-04-22T16:04:33 1776873873

Q4-Q5 quants of this model runs well on gaming laptops with 24GB VRAM and 64GB RAM. Can get one of those for around $3,500.

Interesting pros/cons vs the new Macbook Pros depending on your prefs.

And Linux runs better than ever on such machines.

doix · 2026-04-22T16:07:38 1776874058

What laptop has that much VRAM and RAM for $3500 with good/okay-ish Linux support? I was looking to upgrade my asus zephyrus g14 from 2021 and things were looking very expensive. Decided to just keep it chugging along for another year.

Then again, I was looking in the UK, maybe prices are extra inflated there.

green7ea · 2026-04-22T18:54:41 1776884081

I got a HP g1a for about 3k€ with 64gb of ram when it came out

kroaton · 2026-04-22T16:09:32 1776874172

A3B-35B is better suited for laptops with enough VRAM/RAM. This dense model however will be bandwidth limited on most cards.

The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.

jadbox · 2026-04-22T16:19:43 1776874783

I find A3B-35B as an ideal model for small local projects- definitely the best for me so far