Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know good provider for low latency llm api provider? We tried to look at Cerebras and Groq but they have 0 capacity right now. GPT models are too slow for us at the moment. Gemini are better but not really at same level as GPT.
 help



This depends a bit on your cost sensitivity and what model families you want support for, but Baseten and Fireworks have been my goto.

Currently Baseten has ~610ms TTFT and ~82 tk/s for Kimi K2.6, which is roughly 2x the throughput of GPT-5.4 (per their openrouter stats). GLM 5 is slightly slower on both metrics, but still strong.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: