The AI Concepts Podcast

51 Episodes

Reverse

Module 3: Context Windows & Attention Complexity

2026-01-2610:37

This episode addresses the physical and mathematical limits of a model’s "short-term memory." We explore the context window and the engineering trade-offs required to process long documents. You will learn about the quadratic cost of attention where doubling the input length quadruples the computational work and why this creates a massive bottleneck for long-form reasoning. We also introduce the architectural tricks like Flash Attention that allow us to push these limits further. By the end, you will understand why context is the most expensive real estate in the generative stack.

Module 3: The Lifecycle of an LLM : Pre-Training

2026-01-2510:11

This episode explores the foundational stage of creating an LLM known as the pre-training phase. We break down the Trillion Token Diet by explaining how models move from random weights to sophisticated world models through the simple objective of next token prediction. You will learn about the Chinchilla Scaling Laws or the mathematical relationship between model size and data volume. We also discuss why the industry shifted from building bigger brains to better fed ones. By the end, you will understand the transition from raw statistical probability to parametric memory.

Module 2: The MLP Layer - Where Transformers Store Knowledge

2026-01-0607:50

Shay explains where a transformer actually stores knowledge: not in attention, but in the MLP (feed-forward) layer. The episode frames the transformer block as a two-step loop: attention moves information between tokens, then the MLP transforms each token’s representation independently to inject learned knowledge.

Module 2: The Encoder (BERT) vs. The Decoder (GPT)

2026-01-0508:23

Shay breaks down the encoder vs decoder split in transformers: encoders (BERT) read the full text with bidirectional attention to understand meaning, while decoders (GPT) generate text one token at a time using causal attention. She ties the architecture to training (masked-word prediction vs next-token prediction), explains why decoder-only models dominate today (they can both interpret prompts and generate efficiently with KV caching), and previews the next episode on the MLP layer, where most learned knowledge lives.

Module 2: Multi Head Attention & Positional Encodings

2026-01-0509:00

Shay explains multi-head attention and positional encodings: how transformers run multiple parallel attention 'heads' that specialize, why we concatenate their outputs, and how positional encodings reintroduce word order into parallel processing. The episode uses clear analogies (lawyer, engineer, accountant), highlights GPU efficiency, and previews the next episode on encoder vs decoder architectures.

Module 2: Inside the Transformer -The Math That Makes Attention Work

2026-01-0311:58

In this episode, Shay walks through the transformer's attention mechanism in plain terms: how token embeddings are projected into queries, keys, and values; how dot products measure similarity; why scaling and softmax produce stable weights; and how weighted sums create context-enriched token vectors. The episode previews multi-head attention (multiple perspectives in parallel) and ends with a short encouragement to take a small step toward your goals.

Module 2: Attention Is All You Need (The Concept)

2026-01-0311:40

Shay breaks down the 2017 paper "Attention Is All You Need" and introduces the transformer: a non-recurrent architecture that uses self-attention to process entire sequences in parallel. The episode explains positional encoding, how self-attention creates context-aware token representations, the three key advantages over RNNs (parallelization, global receptive field, and precise signal mixing), the quadratic computational trade-off, and teases a follow-up episode that will dive into the math behind attention.

Module 2: The Transformer Architecture: History - The Bottleneck That Broke Language Models

2026-01-0307:28

Shay breaks down why recurrent neural networks (RNNs) struggled with long-range dependencies in language: fixed-size hidden states and the vanishing gradient caused models to forget early context in long texts. He explains how LSTMs added gates (forget, input, output) to manage memory and improve short-term performance but remained serial, creating a training and scaling bottleneck that prevented using massive parallel compute. The episode frames this fundamental bottleneck in NLP and sets up the next episode on attention, ending with a brief reflection on persistence and steady effort.

Module 1: Tokens - How Models Really Read

2025-12-1311:56

This episode dives into the hidden layer where language stops being words and becomes numbers. We explore what tokens actually are, how tokenization breaks text into meaningful fragments, and why this design choice quietly shapes a model’s strengths, limits, and quirks. Once you understand tokens, you start seeing why language models sometimes feel brilliant and sometimes strangely blind.

Module 1: The Autoregressive Assumption | How Language Emerges in AI

2025-12-1314:11

This episode explores the hidden engine behind how language models move from knowing to creating. It reveals why generation happens step by step, why speed has hard limits, and why training and usage behave so differently. Once you see this mechanism, the way models write, reason, and sometimes stall will make immediate sense.

Module 1: The Latent Space & Manifolds | How Models Encode Meaning

2025-12-1312:23

This episode is about the hidden space where generative models organize meaning. We move from raw data into a compressed representation that captures concepts rather than pixels or tokens, and we explore how models learn to navigate that space to create realistic outputs. Understanding this idea explains both the power of generative AI and why it sometimes fails in surprising ways.

Module 1: The Generative Turn (Discriminative vs. Generative)

2025-12-1307:59

Welcome to Episode One of The Generative Shift. This episode introduces the core change behind modern AI, the move from discriminative models that draw decision boundaries to generative models that learn the full structure of data. Instead of predicting labels using conditional probability, generative systems model the joint distribution itself, which allows them to create rather than classify. This shift reshapes the math, the architecture, and the compute requirements, moving from compression focused networks to expansion driven systems that grow structure from noise. It is harder and more expensive, but it is the foundation of everything that follows. In the next episode, we will explore where this expansion lives by stepping into latent space and understanding how models represent meaning itself.

Intro to The Generative AI Series

2025-12-1301:48

Hello everyone, and welcome to The Generative AI Series. I’m Shay, and this introductory episode is about why this series exists and who it is for. Generative AI has exploded, but real understanding is still scattered. Between hype, shortcuts, and surface level strategy talk, it is hard to find a clear path from fundamentals to building systems that actually work. This series is for practitioners, builders, architects, and technical leaders who want to understand how these models work under the hood, why they succeed, and why they fail. We will go deep but stay accessible, moving step by step from the shift from classification to generation, through transformers, training, RAG, evaluation, and production realities. The goal is simple: build intuition, recognize failure modes early, and design solutions and strategies that work beyond demos, in the real world. Let’s get started. I’ll see you in Module One.

Deep Learning Series: Autoencoders

2025-07-1706:08

Welcome to the final episode of our Deep Learning series on the AI Concepts Podcast. In this episode, host Shay takes you on a journey through the world of autoencoders, a foundational AI model. Unlike traditional models that predict or label, autoencoders excel in understanding and reconstructing data by learning to compress information. Discover how this quiet revolution in AI powers features like image enhancement and noise-cancelling technology, and serves as a stepping stone towards generative AI. Whether you're an AI enthusiast or new to the field, this episode offers insightful perspectives on how machines learn structure and prepare for the future of AI.

Deep Learning Series: Transformers

2025-07-1709:30

Welcome to the AI Concepts Podcast, where we explore AI, one concept at a time. In this episode, host Shay delves into the transformative world of transformers in AI, focusing on how they have revolutionized language understanding and generation. Discover how transformers enable models like ChatGPT to respond thoughtfully and coherently, transforming inputs into conversational outputs with unprecedented accuracy. The discussion unveils the structure and function of transformers, highlighting their reliance on parallel processing and vast datasets. Tune in to unravel how transformers are not only reshaping AI but also the foundation of deep learning advances. Relax, sip your coffee, and let's explore AI together.

Deep Learning Series: Attention Mechanism

2025-07-1708:26

In this episode of the AI Concepts Podcast, host Shay delves into the transformation of deep learning architectures, highlighting the limitations of RNNs, LSTM, and GRU models when handling sequence processing and long-range dependencies. The breakthrough discussed is the attention mechanism, which allows models to dynamically focus on relevant parts of input, improving efficiency and contextual awareness. Shay unpacks the process where every word in a sequence is analyzed for its relevance using attention scores, and how this mechanism contributes to faster training, better scalability, and a more refined understanding in AI models. The episode explores how attention, specifically self-attention, has become a cornerstone for modern architectures like GPT, BERT, and others, offering insights into AI's ability to handle text, vision, and even multimodal inputs efficiently. Tune in to learn about the transformative role of attention in AI and prepare for a deeper dive into the upcoming discussion on the transformer architecture, which has revolutionized AI development by focusing solely on attention.

Deep Learning Series: Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU)

2025-04-1309:51

Welcome to another episode of the AI Concepts Podcast, where we simplify complex AI topics into digestible explanations. This episode continues our Deep Learning series, diving into the limitations of Recurrent Neural Networks (RNNs) and introducing their game-changing successors: Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs). Learn how these architectures revolutionize tasks with long-term dependencies by mastering memory control and selective information processing, paving the way for more advanced AI applications. Explore the intricate workings of gates within LSTMs, which help in managing information flow for better memory retention, and delve into the lightweight efficiency of GRUs. Understand how these innovations bridge the gap between theoretical potential and practical efficiency in AI tasks like language processing and time series prediction. Stay tuned for our next episode, where we’ll unravel the attention mechanism, a groundbreaking development that shifts the paradigm from memory reliance to direct input relevance, crucial for modern models like transformers.

Deep Learning Series: Recurrent Neural Network

2025-04-1306:21

Welcome to the AI Concepts Podcast! In this episode, we dive into the fascinating world of Recurrent Neural Networks (RNNs) and how they revolutionize the processing of sequential data. Unlike models you've heard about in previous episodes, RNNs provide the capability to remember context over time, making them essential for tasks involving language, music, and time series predictions. Using analogies and examples, we delve into the mechanics of RNNs, exploring how they utilize hidden states as memory to process data sequences effectively. Discover how RNNs, envisioned with loops and time-state memory, tackle the challenge of contextual dependencies across data sequences. However, basic RNNs face limitations, like struggling with long-range dependencies due to issues like the vanishing gradient problem. We set the stage for our next episode where we'll discuss advanced architectures, such as LSTMs and GRUs, which are designed to overcome these challenges. Tune in for a captivating exploration of how RNNs handle various AI tasks and join us in our next episode to learn how these networks have evolved with advanced mechanisms for improved learning and memory retention.

Deep Learning Series: Convolutional Neural Network

2025-04-1306:25

Welcome to the AI Concepts Podcast! In this deep dive into Convolutional Neural Networks (CNNs), we unravel their unique ability to process and interpret image data by focusing on local patterns and spatial structures. Understand how CNNs tackle the challenge of vast input sizes and learn to identify features without exhaustive connections, making them ideal for tasks involving images. Explore the mechanics of CNNs as they employ filters and pooling techniques, transforming raw pixel data into meaningful insights through feature maps. Discover how these networks create a hierarchy of features, akin to human visual processing, to classify and predict with remarkable accuracy. Get ready to expand your perspective on AI, as we prepare to embark on the next journey into Recurrent Neural Networks (RNNs) for handling sequential data. Join us, embrace gratitude in present moments, and stay curious!

Deep Learning Series: What is Batch Normalization?

2025-04-1307:12

In this episode of the AI Concepts Podcast, host Shay delves into the complexities of deep learning, focusing on the challenges of training deep neural networks. She explains how issues like internal covariate shift can hinder learning processes, especially as network layers increase. Through the lens of batch normalization, Shea illuminates how this pivotal technique stabilizes learning by normalizing the inputs of each layer, facilitating faster, more stable training. Learn about the profound impact of batch normalization and why it’s a cornerstone innovation in modern deep learning. The episode concludes with reflections on the importance of directing one's attention wisely, setting the stage for future discussions on convolutional neural networks and their role in image recognition.

#box-pro-ellipsis-177058343703028{-webkit-line-clamp:2;}The AI Concepts Podcast