Right, but no separate image encoder + half the size could be very helpful for m... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		axiom92 on Oct 18, 2023 \| parent \| context \| favorite \| on: Fuyu-8B: A multimodal architecture for AI agents Right, but no separate image encoder + half the size could be very helpful for many applications.

GaggiX on Oct 18, 2023 [–]

The 7B LLaVa model is smaller, even considering the image encoder (CLIP-L).

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact