DiscoverThe Data, Lakehouse and AI Show
The Data, Lakehouse and AI Show
Claim Ownership

The Data, Lakehouse and AI Show

Author: Dremio Agentic Lakehouse

Subscribed: 1Played: 1
Share

Description

A podcast where Dremio Head of DevRel, Alex Merced, gives you the run down on what's doing on in Data, Lakehouse and AI. Regular audio episodes with the occasional video webinar episode.

Experience Agentic Analytics by trying the Dremio Free Trial at: https://drmevn.fyi/get-started-pod26
3 Episodes
Reverse
This week on The Data, Lakehouse and AI Show, we are covering one of the most significant weeks in the history of the open data ecosystem. Apache Polaris has officially graduated from the Apache Incubator to become a Top-Level Apache Software Foundation project. Co-created by Dremio and built around the Apache Iceberg REST Catalog specification, Polaris received 27 binding +1 votes in its graduation round, reflecting the community confidence built through six releases, more than 100 contributors, and nearly 3,000 merged pull requests. With graduation complete, the new Polaris PMC now operates with full ASF oversight and a community-driven roadmap focused on credential vending for non-AWS storage backends, deeper Delta Lake support, and idempotent commit operations. This is the open catalog moment the lakehouse ecosystem has been building toward.Over on the Apache Iceberg dev list, the community is deep in V4 planning. The hottest thread this week is the proposal to make the root metadata.json file optional — for streaming write workloads on Hadoop and HMS catalog backends, writing that file on every commit creates real performance bottlenecks, and two paths are under debate. We also saw the first sync for Iceberg's native index support feature, a key V4 capability enabling faster lookups without full table scans. And a concurrency bug was flagged in the snapshot expiration logic where a race window between the ExpireSnapshots job and concurrent ref additions could cause live snapshots to be incorrectly removed. The iceberg-go project already has a fix, and the Java team is investigating.On the Apache Arrow side, a fascinating thread challenged the use of QUIC for IPC stream multiplexing — QUIC's independent delivery model conflicts with use cases requiring explicit ordering across batches from different logical streams. Arrow is also joining Google Summer of Code 2026 and published a formal security model this week. Apache Parquet continued advancing the Adaptive Lossless floating-Point encoding spec, which enables more compact storage of floating-point columns — a meaningful win for ML feature tables and financial datasets. Parquet Java 1.17.0, which dropped Java 8 support, is now moving into broader production adoption, so if your team is still on Java 8, it is time to plan that upgrade.In the AI chipset world, NVIDIA gave CNBC an exclusive first look at the Vera Rubin system — 72 Rubin GPUs and 36 Vera CPUs in a fully liquid-cooled rack-scale NVL72, set to ship in the second half of 2026. NVIDIA claims Rubin delivers a 10x reduction in inference token cost and 4x fewer GPUs needed to train MoE models compared to Blackwell. Jensen Huang teased a chip that will "surprise the world" at GTC on March 16. Meta expanded its NVIDIA partnership in a deal worth tens of billions, locking in millions of GPUs, Vera Rubin rack systems, and becoming the first company to deploy NVIDIA's Grace CPUs as standalone data center chips.In the agentic AI tooling space, Apple released Xcode 26.3 with native support for Claude Agent and OpenAI Codex, exposing Xcode's capabilities through the Model Context Protocol — one of the clearest mainstream signals yet that MCP is becoming the plumbing layer for agentic developer tools. OpenAI launched a macOS Codex app alongside GPT-5.2-Codex, their most advanced agentic coding model, with improvements in long-horizon context compaction, large-scale refactors, and cybersecurity capabilities.On the Dremio front, the big news is STACKIT Dremio entering public preview — a sovereign, EU-operated lakehouse service built on Apache Iceberg and Apache Polaris for organizations that need EU data residency without sacrificing open standards. STACKIT's own internal deployment achieved 40% lower TCO and 4x faster time to insight. And the Forrester Q1 2026 Data Lakehouses Landscape confirms it: lakehouses are no longer experimental — they are the default architectural choice for modern analytics and AI.
This week on The Data, Lakehouse and AI Show, we cover one of the biggest weeks in the Apache lakehouse ecosystem in recent memory. Apache Polaris officially graduated to a Top-Level Apache Project — a milestone that signals just how central open catalog infrastructure has become to the modern data stack. We break down what graduation means, why it matters for Iceberg users, and what's next for the project.On the Iceberg side, the 1.11.0 release is imminent with Spark 4.1 support, geo predicates, REST scan planning, and the long-awaited drop of Java 11. We also dive into the active Bloom skipping index sync and the growing conversation around efficient column updates for wide-table AI/ML workloads.In chipset news, Meta and NVIDIA announced a landmark multiyear partnership — including the first large-scale deployment of NVIDIA Grace standalone CPUs — and Jensen Huang has teased a mystery chip announcement at GTC 2026 on March 16th. We'll be watching closely.On the tools front, Apple dropped Xcode 26.3 with Claude Agent and OpenAI Codex baked in, OpenAI released GPT-5.2-Codex for long-horizon agentic coding, and the Anthropic 2026 Agentic Coding Trends Report confirms we're firmly in the multi-agent era.Plus: Apache Arrow 23.0.1 shipped, Arrow Rust v58 is in RC, and we share a practical data modeling tip on using Iceberg's Partition Evolution to avoid costly table rewrites as your query patterns change.
Intro Episode

Intro Episode

2026-01-1400:48

Subscribe so you don't miss future episodes.Events Page:Dremio.com/eventsGet Started with Agentic Analytics:Dremio.com/get-started
Comments 
loading