320: Azure gives your Finops person a heart attack

Update: 2025-09-11

Description

Welcome to episode 320 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan are coming to you from Justin’s echo chamber and bringing all the latest in AI and Cloud news, including updates to Google’s Anti-trust case, AWS Cost MCP, new regions, updates to EKS, Veo, and Claude, and more! Let’s get into it.

Titles we almost went with this week:

Breaking Bad Bottlenecks: AWS Cooks Up Faster Container Pulls

The Bucket List: Finding Your Lost Storage Dollars

State of Denial: Terraform Finally Stops Saving Your Passwords

Three Stages of Azure Grief: Development, Preview, and Launch

Ground Control to Major Cloud: Microsoft Launches Planetary Computer Pro

Veo Vidi Vici: Google Conquers Video Editing

Red Alert: AWS Makes Production Accounts Actually Look Dangerous

Amazon EKS Discovers the F5 Key

Chaos Theory Meets ChatGPT: When Your Reliability Data Gets an AI Therapist

Breaking Bad (Services): How AI Helps You Find What’s Already Broken

Breaking Up is Hard to Cloud: Gemini Moves Back In

Intel Inside Your Secrets: TDX Takes Over Google Cloud

Lord of the Regions: The Return of the Kiwi

All Blacks and All Stacks: AWS Goes Full Kiwi

Azure Forecast: 100% Chance of Budget Alert Storms

Google Keeps Its Cloud Together: A $2.5T Near Miss

Shell We Dance? AWS Makes CLI Scripting Less Painful

AWS Finally Admits Nobody Remembers All Those CLI Commands

Cache Me If You Claude

Your AWS Console gets its Colors, just don’t choose red shirts

Amazon Q walks into a bar, Tells MCP to order it a beer.. The Bartender sighs and mutters “at least chatgpt just hallucinates its beer”

Ryan’s shitty scripts now as a AWS CLI Library

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.

General News

00:57 Google Dodges A 2.5t Breakup

We have breaking news – and it’s good news for Google.

Google successfully avoided a potential $2.5 trillion breakup following antitrust proceedings, maintaining its current corporate structure despite regulatory pressure.

The decision represents a significant outcome for Big Tech antitrust cases, potentially setting a precedent for how regulators approach market dominance issues in the cloud and technology sectors.

Cloud customers and partners can expect business continuity with Google Cloud Platform services, avoiding potential disruptions that could have resulted from a corporate restructuring.

The ruling may influence how other major cloud providers structure their businesses and approach regulatory compliance, particularly around bundling services and market competition.

Enterprise customers relying on Google’s integrated ecosystem of cloud, advertising, and productivity tools can continue their current architectures without concerns about service separation.

You just KNOW Microsoft is super mad about this.

AI Is Going Great – Or How ML Makes Money

02:16 Introducing GPT-Realtime

OpenAI‘s GPT-Realtime introduces real-time processing capabilities to GPT models, reducing latency for interactive applications and enabling more responsive AI experiences in cloud environments.

The technology leverages optimized model inference and architectural changes to deliver sub-second response times, making it suitable for live customer service, real-time translation, and interactive coding assistants.

Cloud providers can integrate GPT-Realtime through new API endpoints, offering developers the ability to build applications that require immediate AI responses without traditional batch processing delays.

This development addresses a key limitation in current LLM deployments where response latency has restricted use cases in time-sensitive applications like live streaming, gaming, and financial trading systems.

For businesses running AI workloads in the cloud, GPT-Realtime could reduce infrastructure costs by eliminating the need for pre-processing queues and enabling more efficient resource utilization through streaming inference.

02:58 Matt – “More AI scam calling coming your way.”

Cloud Tools

04:14 Terraform provider for Google Cloud 7.0 is now GA

Terraform Google Cloud provider 7.0 introduces ephemeral resources and write-only attributes that prevent sensitive data, such as access tokens and passwords, from being stored in state files, addressing a major security concern for infrastructure teams.

The provider now supports over 800 resources and 300 data sources with 1.4 billion downloads, making it one of the most comprehensive infrastructure-as-code tools for Google Cloud Platform management.

New validation logic catches configuration errors during Terraform plan rather than apply, providing fail-fast behavior that makes deployments more predictable and reduces failed infrastructure changes.

Breaking changes in 7.0 align the provider with Google Cloud’s latest APIs and mark functionally required attributes as mandatory in schemas, requiring teams to review upgrade guides before migrating from version 6.

The ephemeral resource feature leverages Terraform 1.10+ capabilities to handle temporary credentials, such as service account access tokens, without exposing state file attributes (write-only). This solves the long-standing problem of secret management in GitOps workflows.

05:19 Ryan – “I like the ephemeral resources; I think it’s a neat model for handling sensitive information and stuff you don’t want to store. It’s kind of a neat process.”

06:50 How to get fast, easy insights with the Gremlin MCP Server

Gremlin’s MCP Server connects chaos engineering data to LLMs like ChatGPT or Claude, enabling teams to query their reliability testing results using natural language to uncover insights about service dependencies, test coverage gaps, and which services to test next.

The server architecture consists of three components: the LLM client, a containerized MCP server that interfaces with Gremlin’s API, and the Gremlin API itself – designed for read-only operations to prevent accidental system damage during data exploration.

This solves the problem of making sense of complex reliability testing data by allowing engineers to ask plain English questions like “Which of my services should I test next?” Instead of manually analyzing test results and metrics.

The tool requires a Gremlin account with REST API key, an AI interface that supports MCP servers like Claude Desktop, and Node.js 22+ – making it accessible to teams already using Gremlin for chaos engineering.

During internal beta testing at Gremlin, the MCP server helped uncover production-impacting bugs before release, demonstrating its practical value for improving service reliability through AI-assisted data analysis.

07:38 Ryan – “It’s amazing they limited this to read-only commands, the API. I don’t know why they did that…it’s kind of neat to see the interaction model with different services.”

AWS

09:21 Introducing Seekable OCI Parallel Pull mode for Amazon EKS | Containers

AWS introduces SOCI Parallel Pull mode for EKS to address container image pull bottlenecks, particularly for AI/ML workloads where images can exceed 10GB and take several minutes to download using traditional methods.

The feature parallelizes both the download and unpacking phases, utilizing multiple HTTP connections per layer for downloads and concurrent CPU cores for unpacking, to achieve up to 60% faster pull times compared to standard containerd configurations.