Build Wiz AI Show

158 Episodes

Reverse

From Context Engineering to AI Agent Harnesses

2025-11-1413:45

Lance Martin of Langchain will discuss the shift in AI from model training to orchestrating powerful LLMs and computing primitives via a new software discipline. Discover practical context engineering techniques—including managing context rot, reduction, offloading, and isolation—and building effective agent harnesses for managing tool calls in non-deterministic systems. This session emphasizes that simplicity and observability remain vital, requiring builders to continuously rearchitect due to exponentially improving foundation models.

First AI-Orchestrated Cyber Espionage Campaign Disrupted

2025-11-1311:56

State-sponsored group GTG-1002 executed the first reported cyber espionage campaign largely run by autonomous AI, fundamentally shifting the threat landscape. The actor manipulated Claude Code to autonomously perform 80–90% of tactical operations, including vulnerability discovery and data exfiltration, against high-value targets such as major technology corporations. This unprecedented agentic AI misuse demands immediate security attention and highlights rapidly dropping barriers to large-scale, sophisticated attacks.

Sam Altman on the future of AI and its massive impact on society

2025-11-1114:40

Join us for a candid conversation with OpenAI CEO Sam Altman on the future of AI and its massive impact on society. Altman explains why AI is the most important career choice for this generation and details the expected tectonic shifts in software development and computer science education. We also explore frontier research questions, including data efficiency, future architectures, and the crucial intersection of AI security and safety.

🧠 Supervised Reinforcement Learning for Step-wise Reasoning

2025-11-1112:37

Large Language Models often struggle with complex, multi-step reasoning where traditional Supervised Fine-Tuning (SFT) and Reinforcement Learning (RLVR) fail due to rigid imitation or sparse rewards. We dive into Supervised Reinforcement Learning (SRL), a novel framework that reformulates problem-solving into a sequence of logical actions, providing rich, step-wise guidance based on expert similarity. Discover how this approach enables small models to achieve superior performance in challenging mathematical reasoning and agentic software engineering tasks, inducing flexible and sophisticated planning behaviors.

Kimi K2: the current Leading Open-Weight Agentic Model

2025-11-0914:08

Moonshot AI's Kimi K2 Thinking is changing the global LLM landscape, as this 1-trillion parameter open-weight model challenges the performance of closed rivals like GPT-5 and Claude on complex reasoning and coding benchmarks. We dive into the model's architecture, featuring a massive 256K context window and advanced "agentic intelligence" capable of orchestrating hundreds of sequential tool calls autonomously. Tune in to understand why Kimi K2 Thinking is heraldeda watershed moment for open AI, intensifying the pressure on proprietary models and promising a new era of highly capable, accessible AI.

AI Vision of the Future: An Expert Panel Discussion

2025-11-0813:27

Join AI pioneers and 2025 Queen Elizabeth Prize winners, including Jensen Huang, Geoffrey Hinton, and Yann LeCun, as they share the personal "aha" moments that launched the deep learning revolution. They reflect on the current state of the AI market, debating if the explosive demand signals a bubble or the "very beginning of the buildout of intelligence". The discussion concludes by exploring the quest for human-level intelligence (AGI), examining future scientific breakthroughs needed and offering varied timelines for when machines might supersede human capabilities.

Creating Claude Code: Agent Design and Product Philosophy

2025-11-0718:22

Join the engineers who built Claude Code to explore their counterintuitive decision to ditch the IDE for a terminal-first experience. They reveal how enabling the model to master Bash and other tools created a powerful new agent paradigm that sees everything an engineer does at the terminal. Plus, hear how internal "ant fooding"—used by up to 80% of technical employees—resulted in massive productivity gains, reporting an almost 70% increase in productivity per engineer.

Context Engineering 2.0: The Context of Context Engineering

2025-11-0415:10

Context Engineering (CE) is the systematic process designed to bridge the cognitive gap between human intent and machine understanding by optimizing context collection, storage, management, and usage. We explore CE’s history, tracing its evolution over 20 years from the "primitive computation" of Era 1.0 to the current "agent-centric intelligence" of Era 2.0, driven by large language models (LLMs). Discover how engineers reduce high-entropy human contexts into low-entropy machine representations, aiming for a future where AI achieves human-level or even superhuman context assimilation.

⚡ Agent Lightning: Reinforcement Learning for Any AI Agent

2025-11-0415:36

Agent Lightning introduces a revolutionary approach to optimizing AI agents by fully decoupling Reinforcement Learning (RL) training from agent execution. We dive into how this framework allows developers to apply powerful RL techniques—like the hierarchical LightningRL algorithm—to any existing agent, regardless of its underlying framework, with almost zero code modifications. Tune in to learn how this standardized approach is unlocking continuous performance gains across complex real-world tasks like multi-agent text-to-SQL and Retrieval-Augmented Generation (RAG).

🛡️ Breaking Agent Backbones: Evaluating LLM Security in AI Agents

2025-10-3116:03

Breaking Agent Backbones: AI agents are being deployed at scale, but their security is challenged by non-deterministic behavior and novel vulnerabilities. This episode introduces the "threat snapshot" framework and the new b3 benchmark, which systematically isolate and evaluate security risks stemming from the backbone LLM. We reveal crucial findings: enhanced reasoning capabilities generally improve security, yet model size does not correlate with lower vulnerability scores.

🚀 OpenAI's Future: Research, Product, and Infrastructure Vision

2025-10-3015:45

In this episode, OpenAI leaders share unprecedented transparency regarding their research goals, aiming for a fully automated AI researcher by March 2028 and discussing the rapid approach of superintelligence. They detail a new structure, featuring a nonprofit foundation that governs a Public Benefit Corporation, essential for attracting the resources needed for their colossal $1.4 trillion infrastructure commitment. The discussion also covers the pivot to an AI cloud platform model, the importance of accelerating scientific discovery, and the establishment of AI resilience efforts to handle societal risks.

GitHub Universe 2025: Agent HQ, The Agent Workflow

2025-10-3016:37

Welcome to the new era of coding collaboration: Agent HQ is here, establishing GitHub as the centralized home for developers and a fleet of AI coding agents. We explore how the fully-fledged GitHub Copilot agent, alongside partners like Claude and Codex, now operates with deeper context and the ability to execute and coordinate tasks across the developer workflow. Discover how innovations like Mission Control and Plan Mode provide developers with the confidence and control to orchestrate parallel tasks and integrate AI natively into their existing processes, fundamentally changing the developer tool chain.

Jensen Huang - NVIDIA - Keynote 10/2025

2025-10-2914:16

We delve into Jensen Huang's vision that Artificial Intelligence marks the New Industrial Revolution, positioning it as essential national infrastructure and America's next Apollo moment. We explore how NVIDIA's extreme co-design and Accelerated Computing enable new "AI Factories," achieving 10X generational performance leaps to drive down the cost of generating intelligence. The episode concludes by examining new strategic platforms, including 6G telecommunications (NVIDIA ARC), hybrid quantum computing, and the exponential rise of physical AI and robotics.

Perplexity at Work: A Guide to Getting More Done

2025-10-2915:31

The modern workplace often buries professionals under context switching and scattered technology, hindering the productivity gains promised by AI. This episode explores the three stages of working smarter: Block Distractions, Scale Yourself, and Get Results, focusing on how a unified AI platform removes friction. Discover how to move past busywork, amplify your natural curiosity, and channel your enhanced capabilities toward strategic, measurable outcomes that define your career progression.

Context Engineering for AI Agents - from LangChain vs Manus

2025-10-2816:32

Join Lance from LangChain and Pete from Manus as they dive deep into the crucial discipline of Context Engineering for building effective AI agents. This webinar explores the challenge of context explosion—where performance drops as long-running agents accumulate tool call observations—and the core themes used to combat it: offloading, reducing, retrieving, and isolating context. Pete shares fresh lessons from building Manis, detailing the difference between reversible compaction and irreversible summarization, and how their layered action space manages tool confusion.

💻 A Survey of Vibe Coding with LLMs

2025-10-2713:49

Welcome to an essential discussion on Vibe Coding, the new paradigm where developers shift from writing code line-by-line to orchestrating and validating outputs from autonomous AI agents. We'll formalize Vibe Coding as an engineering discipline, exploring its foundations in Large Language Models, complex agent architectures (like planning and memory mechanisms), and integrated feedback loops. Join us as we break down the five distinct development models—from Unconstrained Automation to Test-Driven approaches—and debate the critical challenges of achieving reliable, secure, and scalable human-AI collaboration in software engineering.

AI Adoption, Productivity, and System Thinking - from the interview with Huyen Chip

2025-10-2421:49

Chip Huyen, author of AI Engineering and AI strategy expert from NVIDIA and Netflix, breaks down the technical basics of building successful AI products, covering pre-training, RAG, RLHF, and effective evaluation design. We tackle the growing AI "idea crisis" and the crucial gap between what builders think improves AI applications (like chasing the latest news) versus what actually works (like focusing on user feedback and data preparation). Chip offers essential, in-depth insights into system thinking, organizational structure shifts, and where real productivity gains are being found in the field of AI engineering.

The Hidden Dangers of Browsing AI Agents

2025-10-2314:51

In the hype of ChatGPT Atlas, lets talk about the darkside of Browsing AI Agents

🤏 DeepSeek-OCR: Contexts Optical Compression

2025-10-2115:28

Welcome to the show, where we discuss DeepSeek-OCR and its investigation into using optical 2D mapping for contexts compression, addressing the computational challenges of quadratic scaling faced by Large Language Models. We explore the DeepEncoder, the core engine designed to achieve high compression ratios, delivering near-lossless OCR precision (approximately 97%) even at a 10× token reduction. This groundbreaking work demonstrates strong practical value, achieving state-of-the-art document parsing performance on OmniDocBench while using the fewest vision tokens, offering a promising direction for future memory systems.

Claude Skills: Standard Operating Procedures for Agents

2025-10-1817:36

This episode explores Anthropic's revolutionary 'Skills,' a new way to implement Standard Operating Procedures (SOPs) for LLM agents, ensuring consistent, high-quality output for specialized tasks like Excel analysis and document formatting. We dive into how these portable folders contain instructions and executable code, allowing Claude to efficiently access deep, specialized expertise only when needed. Learn the best practices for authoring these skills—from conciseness and appropriate degrees of freedom to iterative testing—as LLM platforms rapidly evolve into customizable agentic environments.

#box-pro-ellipsis-176351108561876{-webkit-line-clamp:2;}Build Wiz AI Show