Why Nvidia builds open models with Bryan Catanzaro
Description
One of the big stories of 2025 for me was how Nvidia massively stepped up their open model program — more releases, higher quality models, joining a small handful of companies releasing datasets, etc. In this interview, I sat down with one of the 3 VP’s leading the effort of 500+ technical staff, Bryan Catanzaro, to discuss:
* Their very impressive Nemotron 3 Nano model released in Dec. 2025, and the bigger Super and Ultra variants coming soon,
* Why Nvidia’s business clearly benefits from them building open models,
* How the Nemotron team culture was crafted in pursuit of better models,
* Megatron-LM and the current state of open-source training software,
* Career reflections and paths into AI research,
* And other topics.
The biggest takeaway I had from this interview is how Nvidia understands their unique roll as a company that and both build and directly capture the value they get from building open language models, giving them a uniquely sustainable advantage.
Bryan has a beautiful analogy for open models this early in AI’s development, and how they are a process of creating “potential energy” for AI’s future applications.
I hope you enjoy it!
Guest: Bryan Catanzaro, VP Applied Deep Learning Research (ADLR), NVIDIA. X: @ctnzr, LinkedIn, Google Scholar.
Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.
Nemotron Model Timeline
2019–2022 — Foundational Work
* Megatron-LM (model parallelism framework that has become very popular again recently; alternatives: DeepSpeed, PyTorch FSDP).
* NeMo Framework (NVIDIA’s end-to-end LLM stack: training recipes, data pipelines, evaluation, deployment).
Nov 2023 — Nemotron-3 8B: Enterprise-ready NeMo models. Models: base, chat-sft, chat-rlhf, collection. Blog.
Feb 2024 — Nemotron-4 15B: Multilingual LLM trained to 8T tokens. Paper.
Jun 2024 — Nemotron-4 340B: Major open release detailing their synthetic data pipeline. Paper, blog. Models: Instruct, Reward.
Jul–Sep 2024 — Minitron / Nemotron-Mini: First of their pruned models, pruned from 15B. Minitron-4B (base model), Nemotron-Mini-4B-Instruct. Paper, code.
Oct 2024 — Llama-3.1-Nemotron-70B: Strong post-training on Llama 3.1 70B. Model, collection. Key dataset — HelpSteer2, paper.
Mar–Jun 2025 — Nemotron-H: First hybrid Mamba-Transformer models for inference efficiency. Paper, research page, blog. Models: 8B, 47B, 4B-128K.
May 2025 — Llama-Nemotron: Efficient reasoning models built ontop of Llama (still!). Paper.
Sep 2025 — Nemotron Nano 2: 9B hybrid for reasoning, continuing to improve in performance. 12B base on 20T tokens (FP8 training) pruned to 9B for post-training. Report, V2 collection.
Nov 2025 — Nemotron Nano V2 VL: 12B VLM. Report.
Dec 2025 — Nemotron 3: Nano/Super/Ultra family, hybrid MoE, up to 1M context. Super/Ultra H1 2026. Nano: 25T tokens, 31.6B total / ~3.2B active, releases recipes + code + datasets. Papers: White Paper, Technical Report. Models: Nano-30B-BF16, Base, FP8.
Nemotron’s Recent Datasets
NVIDIA began releasing substantially more data in 2025, including pretraining datasets — making them one of few organizations releasing high-quality pretraining data at scale (which comes with non-negligible legal risk).
Pretraining Data
Collection — CC-v2, CC-v2.1, CC-Code-v1, Code-v2, Specialized-v1, CC-Math-v1. Math paper: arXiv:2508.15096.
Post-Training Data
Core post-training dumps (SFT/RL blends):
* Llama Nemotron Post-Training v1.1 (Apr 2025)
* Nemotron Post-Training v1 (Jul 2025)
* Nemotron Post-Training v2 (Aug 2025)
2025 reasoning/code SFT corpora:
* OpenMathReasoning (Apr 2025)
* OpenCodeReasoning (Apr 2025), OpenCodeReasoning-2 (May 2025)
* AceReason-1.1-SFT (Jun 2025)
* Nemotron-Math-HumanReasoning (Jun 2025), Nemotron-PrismMath (Apr 2025)
NeMo Gym RLVR datasets: Collection
Nemotron v3 post-training (Dec 2025):<a href="https://huggingface.co/collectio























