> Across many applications areas, we’ll be left with a choice of using a 90% acc...

> Across many applications areas, we’ll be left with a choice of using a 90% accurate model we understand, or 99% accurate model we don’t.

And how do you show that it is 99% accurate besides creating enough automated tests to the point that you could write the procedural version?

I think what I was missing from this article is how to evaluate a domain where neural nets or LLMs can be applied. Image-from-text generation is a great one because accuracy isn't strictly defined. However, telling ChatGPT "code this pacemaker for me" would have a real accuracy attached to it that you could confirm with unit tests.