Discover
Build Wiz AI Show
Build Wiz AI Show
Author: Build Wiz AI
Subscribed: 10Played: 492Subscribe
Share
© Build Wiz AI
Description
Build Wiz AI Show is your go-to podcast for transforming the latest and most interesting papers, articles, and blogs about AI into an easy-to-digest audio format. Using NotebookLM, we break down complex ideas into engaging discussions, making AI knowledge more accessible.
Have a resource you’d love to hear in podcast form? Send us the link, and we might feature it in an upcoming episode! 🚀🎙️
Have a resource you’d love to hear in podcast form? Send us the link, and we might feature it in an upcoming episode! 🚀🎙️
204 Episodes
Reverse
Join us as we tackle the "last mile" of AI Agents series, exploring the rigorous operational discipline required to transform fragile prototypes into reliable, production-grade agents. We break down the essential infrastructure of trust—from evaluation gates to automated pipelines—that ensures your system is ready for the real world. This foundation paves the way for our series finale where we will move beyond single-agent operations to look at the future of autonomous collaboration within multi-agent ecosystems.
In this episode of our AI Agent series, we synthesize our previous discussions on evaluation frameworks and observability into a cohesive operational playbook known as the "Agent Quality Flywheel",. Join us as we explore how to transform raw telemetry into a continuous feedback loop that drives relentless improvement, ensuring your autonomous agents are not just capable, but truly trustworthy,. This episode bridges the critical gap between theory and production, providing the final principles needed to build enterprise-grade reliance in an agentic world,.
Moving beyond the temporary "workbench" of individual sessions, episode 3 of our AI Agents series unlocks the power of Memory—the mechanism that allows AI agents to persist knowledge and personalize interactions over time. We will explore how agents utilize an intelligent extraction and consolidation pipeline to curate a long-term "filing cabinet" of user insights, effectively turning them from generic assistants into experts on the specific users they serve.
In this episode in series AI Agents, we discuss the transformation of foundation models from static prediction engines into "Agentic AI" capable of using tools as their "eyes and hands" to interact with the world. We highlight the complexity of connecting these agents to diverse enterprise systems, setting the stage for Episode 1 to explain the "N x M" integration problem and the Model Context Protocol's role in solving it.
Join us for the premiere of our series on AI Agents, exploring the paradigm shift from passive predictive models to autonomous systems capable of reasoning and action. In this episode, we deconstruct the core anatomy of an agent—its "Brain" (Model), "Hands" (Tools), and "Nervous System" (Orchestration)—to explain the fundamental "Think, Act, Observe" loop that drives their behavior. Discover how this architecture enables developers to move beyond traditional coding to become "directors" of intelligent, goal-oriented applications.
DeepSeek-AI introduces DeepSeek-R1, a reasoning model developed through reinforcement learning (RL) and distillation techniques. The research explores how large language models can develop reasoning skills, even without supervised fine-tuning, highlighting the self-evolution observed in DeepSeek-R1-Zero during RL training. DeepSeek-R1 addresses limitations of DeepSeek-R1-Zero, like readability, by incorporating cold-start data and multi-stage training. Results demonstrate DeepSeek-R1 achieving performance comparable to OpenAI models on reasoning tasks, and distillation proves effective in empowering smaller models with enhanced reasoning capabilities. The study also shares unsuccessful attempts with Process Reward Models (PRM) and Monte Carlo Tree Search (MCTS), providing valuable insights into the challenges of improving reasoning in LLMs. The open-sourcing of the models aims to support further research in this area.
Google's "AI Business Trends 2025" report analyzes key strategic trends expected to reshape businesses. It identifies five major trends including the rise of multimodal AI, the evolution of AI agents, the importance of assistive search, the need for seamless AI-powered customer experiences, and the increasing role of AI in security. The report suggests that companies need to understand how AI has impacted current market dynamics in order to innovate, compete and continue to evolve. Early adopters of AI are more likely to dominate the market and grow faster than traditional competitors. This analysis is based on data insights from studies, surveys of global decision makers, and Google Trends research, using tools like NotebookLM to identify these pivotal shifts.
This podcast presents a comprehensive survey of post-training techniques for Large Language Models (LLMs), focusing on methodologies that refine these models beyond their initial pre-training. The key post-training strategies explored include fine-tuning, reinforcement learning (RL), and test-time scaling, which are critical for improving reasoning, accuracy, and alignment with user intentions. It examines various RL techniques such as Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) in LLMs. The survey also investigates benchmarks and evaluation methods for assessing LLM performance across different domains, discussing challenges such as catastrophic forgetting and reward hacking. The document concludes by outlining future research directions, emphasizing hybrid approaches that combine multiple optimization strategies for enhanced LLM capabilities and efficient deployment. The aim is to guide the optimization of LLMs for real-world applications by consolidating recent research and addressing remaining challenges.
Forget the needle in the haystack—can your AI actually sculpt an answer from a mountain of data? This episode explores the "Michelangelo" framework, a new evaluation that challenges models to "chisel away" irrelevant noise to reveal the latent structure hidden within massive contexts. Discover how frontier models like Gemini, GPT-4o, and Claude 3.5 Sonnet stack up in these grueling reasoning tasks and why even the "smartest" models face a sharp performance drop long before reaching the million-token mark.
What if the traditional waterfall process of PRDs, mocks, and manual coding is officially dead? This episode explores how coding agents are fundamentally reshaping EPD (Engineering, Product, and Design) by making implementation cheap and shifting the real bottleneck to expert review and system thinking. Discover why generalists are more valuable than ever and how you can navigate the blending of roles to become a top-tier builder or reviewer in the AI era.
What happens when your autonomous AI assistant decides to go rogue or has its core mission hijacked by a single malicious prompt? Join us as we explore the OWASP Top 10 for Agentic Applications 2026, a critical guide to the new security frontier where autonomous systems plan and execute complex tasks across diverse environments. You’ll discover how to safeguard the future of AI using the principles of Least-Agency and Strong Observability to prevent everything from tool exploitation to catastrophic cascading failures.
Are AI agents really the next "smartphone app" revolution, or is the lack of specialized development tools holding them back?, This episode explores why building these autonomous systems requires moving past traditional linear coding into a world of experimental "observability" to ensure they are actually reliable and secure in production. You'll discover the core building blocks like memory and planning, and walk through a real-world workflow to take your agents from a simple prototype to a high-performing financial researcher.
Stop re-explaining your process in every new chat and start building "Skills," the modular instruction sets that transform Claude into a specialized expert tailored to your unique, repeatable workflows. In this episode, we dive into the "recipe" for success, exploring how to combine your domain knowledge with Model Context Protocol (MCP) to orchestrate complex, multi-step tasks across various services. Tune in to discover the practical patterns needed to plan, test, and distribute your own specialized capabilities, allowing you to build a functional skill in as little as 15 to 30 minutes.
Remember the quiet weeks before the world changed in 2020? We are currently in that same "this seems overblown" phase with AI, as an intelligence explosion begins where machines are now instrumental in creating their own successors. Join us to discover how AI is shifting from a helpful tool to a substitute for complex cognitive work and why spending just one hour a day experimenting with these new models could be the most important career move you ever make.Source: https://shumer.dev/something-big-is-happening
Think your business is safe from cyber threats just because you have the latest software? Think again—2026 is officially the year AI moves from being a simple tool to an autonomous, independent force in the cybersecurity world. In this episode, we dive into how "agentic AI" is reinventing the security operations center and why traditional "scan and patch" models are mathematically impossible to sustain against machine-speed attacks. You'll discover the essential playbook for navigating 2026 trends like Ransomware 5.0 and zero-trust, ensuring your organization is ready for the era of self-defending digital ecosystems.
Ever wonder why even the most advanced AI agents struggle to handle complex tasks in the real world? Today, we explore the Agent World Model (AWM), an open-source pipeline that generates 1,000 diverse, code-driven environments—spanning everything from e-commerce to finance—using SQL databases to provide the stable, consistent training grounds that agents need to master multi-turn tool use. Tune in to discover how these "infinite" synthetic sandboxes are solving the problem of simulation hallucinations and training agents that can generalize to almost any real-world scenario they encounter.Source: https://arxiv.org/abs/2602.10090
Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.Source: https://arxiv.org/pdf/2602.03085
Is the global AI landscape shifting toward a "DeepSeek moment" where cheaper, open-weight models from China challenge the dominance of US frontier labs? Join machine learning experts Sebastian Raschka and Nathan Lambert as they dissect the technical breakthroughs of 2025, from the evolving physics of inference-time scaling to the intensifying competition between organizations like OpenAI, Anthropic, and their international counterparts. Listeners will discover how these advancements are fundamentally transforming the future of software engineering and explore the realistic, often "jagged" roadmap toward AGI.
Is your favorite AI assistant a productivity powerhouse or a "shortcut" that’s secretly stalling your professional growth? We dive into new research revealing that while AI helps you finish tasks, it can simultaneously impair your conceptual understanding and debugging skills by as much as 17%. Join us as we uncover the specific "high-engagement" interaction patterns that allow you to leverage AI without losing your competitive edge.
This episode explores how Generative AI is shifting software development from simple tool-based experimentation to a holistic transformation of engineering excellence. We discuss strategies for achieving 2x engineering capacity while overcoming critical "value blockers" like the toil paradox and organizational resistance. Finally, we examine the future of development teams, where humans transition from primary code creators into expert validators and orchestrators.






