Does anyone know good provider for low latency llm api provider? We tried to loo...

spmurrayzzz · 2026-04-23T02:15:55 1776910555

This depends a bit on your cost sensitivity and what model families you want support for, but Baseten and Fireworks have been my goto.

Currently Baseten has ~610ms TTFT and ~82 tk/s for Kimi K2.6, which is roughly 2x the throughput of GPT-5.4 (per their openrouter stats). GLM 5 is slightly slower on both metrics, but still strong.