cstaszak's comments

cstaszak · 2026-03-22T17:41:23 1774201283

I'm a fan of "civilization in a box" kinds of projects. However the ZIM file format leaves a lot to be desired in 2026. I've been exploring a refreshed, alternative approach: https://github.com/stazelabs/oza

I do think having an LLM as an optional "sidecar" is a useful approach. If you can run a meaningful Ollama instance alongside your content, great!

codeveil · 2026-03-22T19:16:05 1774206965

ZIM or not, I think the “LLM as optional sidecar” part is the right idea.

The durable asset is the knowledge base itself. A local model can be useful on top, but it should stay a layer, not become the dependency.

Schlagbohrer · 2026-03-23T12:39:14 1774269554

Even with that setup I have unfortunately had a bad experience just using Qwen2.5-27B. I asked it once to take a large PDF of a book and find and quote all instances which mentioned food. After churning for a long time it eventually gave me several interesting excerpts, only one of which was real and the rest were hallucinations/confabulations.

I hope we can get to the point where even a small distilled model at the 7B-30B level avoids hallucinating.

nullsmack · 2026-03-25T21:08:44 1774472924

Qwen2.5 is quite old at this point. The new Qwen3.5 series is good, and it has a 27b dense model too. I have to watch it but I've gotten surprising results out of the 4b model even. They're also vision enabled and pretty good at ocr. These were released in just the last few weeks.