DiscoverInterconnectsWhy Nvidia builds open models with Bryan Catanzaro
Why Nvidia builds open models with Bryan Catanzaro

Why Nvidia builds open models with Bryan Catanzaro

Update: 2026-02-04
Share

Description

One of the big stories of 2025 for me was how Nvidia massively stepped up their open model program — more releases, higher quality models, joining a small handful of companies releasing datasets, etc. In this interview, I sat down with one of the 3 VP’s leading the effort of 500+ technical staff, Bryan Catanzaro, to discuss:

* Their very impressive Nemotron 3 Nano model released in Dec. 2025, and the bigger Super and Ultra variants coming soon,

* Why Nvidia’s business clearly benefits from them building open models,

* How the Nemotron team culture was crafted in pursuit of better models,

* Megatron-LM and the current state of open-source training software,

* Career reflections and paths into AI research,

* And other topics.

The biggest takeaway I had from this interview is how Nvidia understands their unique roll as a company that and both build and directly capture the value they get from building open language models, giving them a uniquely sustainable advantage.

Bryan has a beautiful analogy for open models this early in AI’s development, and how they are a process of creating “potential energy” for AI’s future applications.

I hope you enjoy it!

Guest: Bryan Catanzaro, VP Applied Deep Learning Research (ADLR), NVIDIA. X: @ctnzr, LinkedIn, Google Scholar.

Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.

Nemotron Model Timeline

2019–2022 — Foundational Work

* Megatron-LM (model parallelism framework that has become very popular again recently; alternatives: DeepSpeed, PyTorch FSDP).

* NeMo Framework (NVIDIA’s end-to-end LLM stack: training recipes, data pipelines, evaluation, deployment).

Nov 2023 — Nemotron-3 8B: Enterprise-ready NeMo models. Models: base, chat-sft, chat-rlhf, collection. Blog.

Feb 2024 — Nemotron-4 15B: Multilingual LLM trained to 8T tokens. Paper.

Jun 2024 — Nemotron-4 340B: Major open release detailing their synthetic data pipeline. Paper, blog. Models: Instruct, Reward.

Jul–Sep 2024 — Minitron / Nemotron-Mini: First of their pruned models, pruned from 15B. Minitron-4B (base model), Nemotron-Mini-4B-Instruct. Paper, code.

Oct 2024 — Llama-3.1-Nemotron-70B: Strong post-training on Llama 3.1 70B. Model, collection. Key dataset — HelpSteer2, paper.

Mar–Jun 2025 — Nemotron-H: First hybrid Mamba-Transformer models for inference efficiency. Paper, research page, blog. Models: 8B, 47B, 4B-128K.

May 2025 — Llama-Nemotron: Efficient reasoning models built ontop of Llama (still!). Paper.

Sep 2025 — Nemotron Nano 2: 9B hybrid for reasoning, continuing to improve in performance. 12B base on 20T tokens (FP8 training) pruned to 9B for post-training. Report, V2 collection.

Nov 2025 — Nemotron Nano V2 VL: 12B VLM. Report.

Dec 2025 — Nemotron 3: Nano/Super/Ultra family, hybrid MoE, up to 1M context. Super/Ultra H1 2026. Nano: 25T tokens, 31.6B total / ~3.2B active, releases recipes + code + datasets. Papers: White Paper, Technical Report. Models: Nano-30B-BF16, Base, FP8.

Nemotron’s Recent Datasets

NVIDIA began releasing substantially more data in 2025, including pretraining datasets — making them one of few organizations releasing high-quality pretraining data at scale (which comes with non-negligible legal risk).

Pretraining Data

Collection CC-v2, CC-v2.1, CC-Code-v1, Code-v2, Specialized-v1, CC-Math-v1. Math paper: arXiv:2508.15096.

Post-Training Data

Core post-training dumps (SFT/RL blends):

* Llama Nemotron Post-Training v1.1 (Apr 2025)

* Nemotron Post-Training v1 (Jul 2025)

* Nemotron Post-Training v2 (Aug 2025)

2025 reasoning/code SFT corpora:

* OpenMathReasoning (Apr 2025)

* OpenCodeReasoning (Apr 2025), OpenCodeReasoning-2 (May 2025)

* AceReason-1.1-SFT (Jun 2025)

* Nemotron-Math-HumanReasoning (Jun 2025), Nemotron-PrismMath (Apr 2025)

NeMo Gym RLVR datasets: Collection

Nemotron v3 post-training (Dec 2025):<a href="https://huggingface.co/collectio

Comments 
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Why Nvidia builds open models with Bryan Catanzaro

Why Nvidia builds open models with Bryan Catanzaro

Nathan Lambert