Not as simple as that. Everyone would happily use local, but the issue is local ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ChildOfChaos 21 days ago \| parent \| context \| favorite \| on: Tell HN: Anthropic no longer allowing Claude Code ... Not as simple as that. Everyone would happily use local, but the issue is local sucks.

nekusar 21 days ago [–]

https://github.com/brontoguana/krasis

On my desktop RTX 5060 TI (16GB) and 96GB ram, I routinely get 25-30 tokens/sec using an 80B model quantized to int8. Uses 65GB system ram and 15GB gfx ram.

And its plenty fast for many of my purposes.

I could easily run a 30B model bf16 (full) and do like 50tok/s

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact