Listen Top Shows Blog

Episode 61: DeepSeek Models Explained - Part II

Episode 61: DeepSeek Models Explained - Part II

Update: 2025-02-04

Share

Description

What if AI could be 95% cheaper? Discover how DeepSeek's game-changing models are reshaping the AI landscape through breakthrough innovations. Journey through the evolution of AI optimization, from GPU efficiency to revolutionary attention mechanisms. Learn when to use (and when to avoid) these powerful new models, with practical insights for both individual users and businesses.

Key highlights:

How DeepSeek achieves dramatic cost reduction through technical innovation

Real-world implications for consumers and enterprises

Critical considerations around data privacy and model alignment

Practical guidance on responsible implementation

References:

Dario Amodei — On DeepSeek and Export Controls

Bite: How Deepseek R1 was trained

[2501.17161] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

[2408.15664] Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

[2412.19437] DeepSeek-V3 Technical Report

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Comments

In Channel

Ep74: The AI Revolution Isn’t in Chatbots—It’s in Thermostats

Ep74: The AI Revolution Isn’t in Chatbots—It’s in Thermostats

2025-05-1329:05

Ep73: Deception Emerged in AI: Why It’s Almost Impossible to Detect

Ep73: Deception Emerged in AI: Why It’s Almost Impossible to Detect

2025-05-0601:11:36

Ep72: Can We Trust AI to Regulate AI?

Ep72: Can We Trust AI to Regulate AI?

2025-04-2248:09

Ep71: The AI Detection Crisis: Why Real Content Gets Flagged

Ep71: The AI Detection Crisis: Why Real Content Gets Flagged

2025-04-1531:42

Ep70: Content Moderation at Scale: Why GPT-4 Isn’t Enough | Aegis vs. the Rest

Ep70: Content Moderation at Scale: Why GPT-4 Isn’t Enough | Aegis vs. the Rest

2025-04-0839:36

Ep69: MCP, GPT-4 Image Editing, and the Future of AI Tool Integration

Ep69: MCP, GPT-4 Image Editing, and the Future of AI Tool Integration

2025-04-0124:07

Ep68: Is GPT-4.5 Already Outdated?

Ep68: Is GPT-4.5 Already Outdated?

2025-03-2530:22

Ep67: Why RAG Fails LLMs – And How to Finally Fix It

Ep67: Why RAG Fails LLMs – And How to Finally Fix It

2025-03-1922:33

Ep66: Fastest LLM Ever? Diffusion AI is Changing Everything

Ep66: Fastest LLM Ever? Diffusion AI is Changing Everything

2025-03-1124:43

Episode 65: The AI Takeover Has Already Begun – Here’s What You Need to Know

Episode 65: The AI Takeover Has Already Begun – Here’s What You Need to Know

2025-03-0451:19

Episode 64: The Rise of Agentic AI: How It’s Already Running the World!

Episode 64: The Rise of Agentic AI: How It’s Already Running the World!

2025-02-2542:20

Episode 63: The Shocking AI Breakthrough That Makes Big Models Like GPT Obsolete

Episode 63: The Shocking AI Breakthrough That Makes Big Models Like GPT Obsolete

2025-02-1801:04:33

Episode 62: AI's Quantum Leap 2025: From Language Models to Video Revolution

Episode 62: AI's Quantum Leap 2025: From Language Models to Video Revolution

2025-02-1101:08:35

Episode 61: DeepSeek Models Explained - Part II

Episode 61: DeepSeek Models Explained - Part II

2025-02-0401:08:35

Episode 60: DeepSeek Models Explained Part I

Episode 60: DeepSeek Models Explained Part I

2025-01-2836:48

Episode 59: Teaching AI to Watch Videos Like Humans

Episode 59: Teaching AI to Watch Videos Like Humans

2025-01-2132:39

Episode 58: How AI Mastered Atari Games: The Deep Q-Network Journey

Episode 58: How AI Mastered Atari Games: The Deep Q-Network Journey

2025-01-1458:11

Episode 57: AI 2024: When Robots Did Laundry & Fake Photos Fooled the World

Episode 57: AI 2024: When Robots Did Laundry & Fake Photos Fooled the World

2024-12-2401:32:46

Episode 56: The Dark Side of AI: When Smart Robots Make Dangerous Mistakes

Episode 56: The Dark Side of AI: When Smart Robots Make Dangerous Mistakes

2024-12-1736:12

Episode 55: The Single Pixel That Tricks Every AI

Episode 55: The Single Pixel That Tricks Every AI

2024-12-1149:27

00:00

00:00

1.0x

Episode 61: DeepSeek Models Explained - Part II

Episode 61: DeepSeek Models Explained - Part II

Saugata Chatterjee