Tag: vision

All the articles with the tag "vision".

llm-concepts
27 Apr, 2026 7 min read

Multimodality: Teaching Models to See and Hear

A multimodal model is not many models in a trench coat. It is one transformer trained to treat pixels, audio, and text as the same kind of thing.