Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem
Description
Sorry this one was late! Thanks for bearing with me, and keep sending feedback my way. Still a year or two away from when I have time to record these, but I would love to.
Open-source tools, examples, limits, and the state of training multimodal models.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/molmo-and-llama-3-vision
00:00 Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem
02:47 Llama vision: Multimodality for the masses of developers
03:27 Molmo: a (mostly) open-source equivalent to Llama vision
08:45 How adding vision changes capabilities and reasoning
11:47 Multimodal language models: Earlier on the exponential
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_013.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_015.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_021.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_023.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_027.png
Fig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_030.png
Fig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_037.png
Fig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_046.png
Fig 9: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_048.png
Fig 10: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_050.png
Fig 11: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_052.png
Fig 12: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_054.png
Fig 13: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_058.png
Fig 14: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_065.png