Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ekelsen
on Oct 18, 2023
|
parent
|
context
|
favorite
| on:
Fuyu-8B: A multimodal architecture for AI agents
Image patches are projected directly into an embedding that goes into the decoder Transformer. The same thing could be done for audio.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: