I'm using the pi-mono coding agent (open source, free) without any extensions and very simple prompts. The 3.6 27B model (BF16, 250k context) uses 67GB VRAM on an RTX PRO 9000.
It's very capable on almost any coding task I've thrown at it, and very good for easy-to-medium hard scripts, new code bases.
It struggles on some complex tasks in larger code bases, e.g. using to debug and fix bugs in llama.cpp it gets close to working code but often introduces errors. For such tasks its still very useful as a search/explore tool and drafting fixes.
Something I've had good progress with using local models and simple open-source harnesses is to repeat, in a new context, simple verification prompts.
I'd run the following 5-10 times with one model, then again with a 2nd model.
"Verify the correctness and completeness of all security configs/rules in SETUP.md. Consider if anything is missing, and if anything is not needed. Do not modify any files; only write potential findings to report.txt"
"Verify all findings and claims in report.txt."
Replace "SETUP.md" with whatever you're working on.
It's both terrifying and incredible watching what the models get correct and what they get completely wrong.
However, after enough runs they tend to settle on a state they claim does not need any more edits. And that result is generally useful with much fewer errors/hallucinations compared to a single run.
Don't you think "consider if anything is missing" leads them into adding something with sycophancy RL training and "if anything is not needed" making it remove something?
Or does "verify all claims in report" counteract that?
Many crypto wallets use a key derivation function (KDF) to add an amount of computation (and memory usage) per password tried - to mitigate brute force of weak passwords.
The increase in compute (decrease in brute-force cost) combined with price increases in many crypto tokens means brute-forcing old wallets can become worth it years after passwords were forgotten.
And of course even smaller, local AI models can now easily write optimized scripts to brute-force any given KDF function.
The compute power needed use to be of the order of 5s per password try.
So it effectively mitigate brute force back them, you need a absurd compute power to crack them.
Moore law did its thing, now you can do it with a lot less computer power.
> Moore law did its thing, now you can do it with a lot less computer power.
s/power/time/ maybe? Or on second thought: so energy-efficicient that it actually uses less power in the same-or-shorter time… which brings me back to "less compute power".
One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.
The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).
Bit of a hype madhouse whenever a new model is released, but it's pretty easy to filter out simple hype from people showing reproducible experiments, specific configs for llama.cpp, github links etc.
This. And when possible, first asking the AI to add more granular logging around the code where the problem is - then re-run the code and feed the new log in a new context.
I've used this to debug some moderately complex bugs in golang and godot code and it works really well - the combo of having a new context with the (sometimes overly) granular debug logging and only the required, specific source code.
Keep it simple and run a fresh, new context for each prompt.
I use the pi-mono coding agent with several different new open models running locally.
The simpler and more precise the prompt the better it works. Some examples:
"Review all golang code files in this folder. Look for refactor opportunities that make the code simpler, shorter, easier to understand and easier to maintain, while not changing the logic, correctness or functionality of the code. Do not modify any code; only describe potential refactor changes."
After it lists a bunch of potential changes, it's then enough to write "Implement finding 4. XYZ" and sometimes add "Do not make any other changes" to keep the resulting agent actions focused.
What laptop has that much VRAM and RAM for $3500 with good/okay-ish Linux support? I was looking to upgrade my asus zephyrus g14 from 2021 and things were looking very expensive. Decided to just keep it chugging along for another year.
Then again, I was looking in the UK, maybe prices are extra inflated there.
A3B-35B is better suited for laptops with enough VRAM/RAM.
This dense model however will be bandwidth limited on most cards.
The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.
It's very capable on almost any coding task I've thrown at it, and very good for easy-to-medium hard scripts, new code bases.
It struggles on some complex tasks in larger code bases, e.g. using to debug and fix bugs in llama.cpp it gets close to working code but often introduces errors. For such tasks its still very useful as a search/explore tool and drafting fixes.