>It seems you haven't done the due diligence on what part of the API is expensiv...

kang · 2026-04-26T03:32:03 1777174323

You are right, I was wrong in my understanding there. It stemmed from my own implementation; an inference often wrote extra data such as tool call, so I was using it to preserve relevant information alongwith desired output, to be able to throw away the prompt every time. I realize inference caching is one better way (with its pros and cons).