Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sets a new record on the Extended NYT Connections benchmark: 96.8 (https://github.com/lechmazur/nyt-connections/).

Grok 4 is at 92.1, GPT-5 Pro at 83.9, Claude Opus 4.1 Thinking 16K at 58.8.

Gemini 2.5 Pro scored 57.6, so this is a huge improvement.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: