The impossibility of cost + latency analysis for LLMs
The LLM application world is moving so fast that any cost + latency analysis is bound to go outdated quickly. Matt Ross, a senior manager of applied research at Scribd, told me that the estimated API cost for his use cases has gone down two orders of magnitude over the last 6 months. Latency has significantly decreased as well. Similarly, many teams have told me they feel like they have to do the feasibility estimation and buy (using paid APIs) vs. build (using open source models) decision every week.
The impossibility of cost + latency analysis for LLMs The LLM application world is moving so fast that any cost + latency analysis is bound to go outdated quickly. Matt Ross, a senior manager of applied research at Scribd, told me that the estimated API cost for his use cases has gone down two orders of magnitude over the last 6 months. Latency has significantly decreased as well. Similarly, many teams have told me they feel like they have to do the feasibility estimation and buy (using paid APIs) vs. build (using open source models) decision every week.