Discover
Build Wiz AI Show

Build Wiz AI Show
Author: Build Wiz AI
Subscribed: 6Played: 319Subscribe
Share
© Build Wiz AI
Description
> Building the future of products with AI-powered innovation. <
Build Wiz AI Show is your go-to podcast for transforming the latest and most interesting papers, articles, and blogs about AI into an easy-to-digest audio format. Using NotebookLM, we break down complex ideas into engaging discussions, making AI knowledge more accessible. Have a resource you’d love to hear in podcast form? Send us the link, and we might feature it in an upcoming episode! 🚀🎙️
Build Wiz AI Show is your go-to podcast for transforming the latest and most interesting papers, articles, and blogs about AI into an easy-to-digest audio format. Using NotebookLM, we break down complex ideas into engaging discussions, making AI knowledge more accessible. Have a resource you’d love to hear in podcast form? Send us the link, and we might feature it in an upcoming episode! 🚀🎙️
138 Episodes
Reverse
**SEAL, the Self-Adapting Language Model framework, is revolutionizing how LLMs learn by enabling them to generate their own finetuning data and update directives. We explore how these powerful models create "self-edits"—synthetic training data and optimization parameters—which are continuously refined through a reinforcement learning loop. Discover how this meta-learning approach allows LLMs to efficiently incorporate new factual knowledge and significantly improve few-shot generalization success rates.
Are expensive Large Language Model (LLM) fine-tuning methods holding back your specialized agents, demanding massive computational resources and data? We dive into Training-Free Group Relative Policy Optimization (Training-Free GRPO), a novel non-parametric method that enhances LLM agent behavior by distilling semantic advantages from group rollouts into lightweight token priors, eliminating costly parameter updates. Discover how this highly efficient approach achieves significant performance gains in specialized domains like mathematical reasoning and web searching, often surpassing traditional fine-tuning while using only dozens of training samples.
Join us for a deep dive with Greg Brockman on the future of AI, where he reveals the internal struggle ("pain and suffering") of managing compute scarcity and the immense physical infrastructure build required to scale systems like Sora 2. Brockman discusses the shift from viewing AGI as a destination to a continuous process, emphasizing that current scaling curves and algorithmic progress continue unabated. We also explore the inevitable move toward proactive AI agents and a fully generative web, predicting a major change to the social contract and web monetization.
Tune in as we explore Agentic Context Engineering (ACE), a novel framework designed to overcome limitations like "brevity bias" and "context collapse" that plague traditional LLM context adaptation methods. ACE transforms model contexts into continuously evolving, structured "playbooks" by employing a modular process of generation, reflection, and curation. We discuss how this approach enables scalable, self-improving agents, yielding substantial performance gains on complex tasks—such as +10.6% on agent benchmarks—while significantly lowering adaptation latency and cost.
This episode explores the Tiny Recursive Model (TRM), a novel approach that leverages a single, tiny network (as small as 7M parameters) to tackle hard puzzle tasks like Sudoku, Maze, and ARC-AGI. We investigate how this simplified, recursive reasoning strategy achieves significantly higher generalization and outperforms much larger models, including complex Large Language Models (LLMs) and the Hierarchical Reasoning Model (HRM). Discover why this "less is more" philosophy is leading to breakthroughs in parameter-efficient AI reasoning by simplifying complex mathematical theories and biological justifications.
Demystify Large Language Model (LLM) evaluation, breaking down the four main methods used to compare models: multiple-choice benchmarks, verifiers, leaderboards, and LLM judges. We offer a clear mental map of these techniques, distinguishing between benchmark-based and judgment-based approaches to help you interpret performance scores and measure progress in your own AI development. Discover the pros and cons of each method—from MMLU accuracy checks to the dynamic Elo ranking system—and learn why combining them is key to holistic model assessment.Original blog post: https://magazine.sebastianraschka.com/p/llm-evaluation-4-approaches
OpenAI DevDay 2025 marked the start of the "agentic era" of software development, focusing on making it "easier to build with AI" and transitioning AI from a "chatbot" into a "doer". We break down the revolutionary AgentKit, featuring Agent Builder, a visual, drag-and-drop platform launched to help developers rapidly deploy multi-step AI agents from prototype to production. We also discuss the new Apps SDK for seamlessly integrating third-party services into ChatGPT and the debut of powerful models like GPT-5 Pro and Sora 2, signifying that software development now takes minutes, not months.
Join us as Turing Award recipient Yann LeCun, Chief Scientist at Meta, critiques the state of AI, arguing that current systems, including Large Language Models (LLMs), are nowhere near matching the learning efficiency observed in humans and animals. LeCun proposes a major architectural shift, advocating that AI must abandon generative models for training and instead focus on building internal "World Models" to enable reasoning and planning. Discover how the Joint Embedding Predictive Architecture (JEPA) uses self-supervised learning to train machines to acquire robust, abstract representations of reality, a crucial step toward achieving common sense and human-level intelligence.
Are smart machines making us forget how to think? This episode dives into the quiet phenomenon of AI-induced skill erosion, where relying on intelligent systems creates an "illusion of mastery" while core competence fades. We explore the organizational implications of deskilling and discuss strategies, such as targeted auditing and better system design, needed to preserve expertise when AI handles essential tasks.How much do you know about this topic and what is your high level goal for learning about this topic?
AI agents represent a paradigm shift in software engineering, but moving a promising prototype to a production-ready system presents a new set of challenges for startups. This episode distills Google's technical guide, offering a systematic roadmap to build, govern, and scale reliable agentic systems using tools like the Agent Development Kit (ADK). Discover the AgentOps framework, a disciplined approach for ensuring your agents are not just powerful, but also responsible and production-ready.
In the race to build AI that can not just think, but work as an autonomous agent, the prevailing wisdom has been that more data is always better. This episode explores a radical new paradigm called LIMI (Less Is More for Intelligent Agency), which challenges this scaling law by showing how sophisticated AI "agency" can emerge from just 78 strategically curated training samples focused on real-world collaborative workflows. Discover how this approach dramatically outperforms state-of-the-art models trained on datasets over 100 times larger, establishing the Agency Efficiency Principle and fundamentally reshaping the future of AI development.
This episode delves into the unprecedented speed of AI adoption, which has outpaced historical technologies like the internet and personal computers. While AI's impact on work is often the focus, new data reveals a surprising trend: non-work related conversations now account for over 70% of consumer chatbot usage, with tasks like seeking practical guidance and writing assistance dominating over coding. We'll explore the strikingly uneven patterns of AI deployment across different countries and enterprises, and discuss what this concentration could mean for global economic inequality and the future of work.
While Large Language Models excel at creative tasks, they often struggle with the logical precision required for symbolic planning. This episode explores PDDL-INSTRUCT, a novel framework that teaches LLMs to reason through complex plans using a logical "chain-of-thought" approach, verifying each step to ensure its validity. Tune in to learn how this method dramatically improves planning accuracy by up to 66%, bridging the gap between general AI reasoning and the formal logic of automated planning.
In this episode, we explore self-consistency, a novel strategy that significantly improves how large language models perform complex reasoning. The method builds on chain-of-thought prompting by generating multiple diverse reasoning paths for a single problem instead of just one. By simply selecting the most consistent answer from these different lines of thought, this unsupervised technique dramatically boosts accuracy on arithmetic and commonsense tasks without any additional model training.
AI is being adopted at a record-breaking pace, far exceeding previous technologies like the internet. However, this new report reveals that the AI revolution is starkly uneven, with usage heavily concentrated in high-income countries and among a narrow set of specialized business tasks. We'll unpack the data on who is using AI and how—from collaborative augmentation to full automation—and explore whether this technology will widen or narrow global economic inequality.
This episode unpacks groundbreaking research into how hundreds of millions of people are actually using ChatGPT. Contrary to popular belief, non-work-related messages have surged to over 70% of all use, with the most common topics being "Practical Guidance," "Seeking Information," and "Writing". We explore the surprising user demographics, including a closing gender gap, and what these patterns reveal about the technology's true economic and social value.
Welcome to "AI Unpacked," your guide to the fascinating world of Large Language Models! In this episode, we'll break down the core concepts, techniques, and critical challenges shaping LLMs, from their internal workings like tokenization and attention mechanisms to real-world deployment considerations. Join us to deepen your understanding of these revolutionary AI systems
In this insightful episode, Sam Altman and Vinod Khosla delve into the world beyond 2035, discussing the astonishing rate of technological change driven by AI and its profound implications for industries and jobs. They explore the rapid acceleration of AI capabilities, the vision for AI as a "default personal AGI," and its potential to revolutionize enterprise operations and scientific discovery. The conversation also touches on the global spread of AI's benefits, the promise of a hugely deflationary economy, and the critical role of government in navigating this transformative era.
Join us as we explore LLM knowledge distillation, a groundbreaking technique that compresses powerful language models into efficient, task-specific versions for practical deployment. This episode delves into methods like TinyLLM and Distilling Step-by-Step, revealing how they transfer complex reasoning capabilities to smaller models, often outperforming their larger counterparts. We'll discuss the benefits, challenges, and compare distillation with other LLM adaptation strategies like fine-tuning and prompt engineering.
In this episode, we delve into why language models "hallucinate," generating plausible yet incorrect information instead of admitting uncertainty. We'll explore how these overconfident falsehoods arise from the statistical objectives minimized during pretraining and are further reinforced by current evaluation methods that reward guessing over expressing doubt. Join us as we uncover the socio-technical factors behind this persistent problem and discuss proposed solutions to foster more trustworthy AI systems.