In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guests are Klemen Simonic, Co-Founder & CEO at Soniox, and Kwindla Hultman Kramer, Co-Founder & CEO at Daily.Klemen Simonic is the CEO and Co-Founder of Soniox, where he leads the development of advanced voice AI models built for real-world performance. He brings over 16 years of experience across industry and academia, with a deep focus on artificial intelligence. He has worked on cutting-edge AI systems at Facebook, Google, Stanford University, and the University of Ljubljana. Klemen has been developing AI technologies since his undergraduate years, spanning speech, language, and large-scale knowledge systems.Kwin is CEO and co-founder of Daily, a developer platform for real-time audio, video, and AI. He has been interested in large-scale networked systems and real-time video since his graduate student days at the MIT Media Lab. Before Daily, Kwin helped to found Oblong Industries, which built an operating system for spatial, multi-user, multi-screen, multi-device computing.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Voice AI adoption is slow because real-time transcription still breaks on the most basic parts of a customer call.* Real growth is happening quietly inside call centers, but teams won’t scale until transcription stops causing cascading errors.* Even the top models fail on emails, addresses, and alphanumerics, which are the single points of failure in most B2B workflows.* Consumer-grade demos hide the reality that long, multi-turn conversations still fall apart without rigorous context control.* POC to production fails not because of LLMs, but because engineering teams underestimate context management.* A universal multilingual model can outperform single-language models by transferring entity knowledge across languages.* Mixed-language conversations are the norm worldwide, and current systems break the moment a user switches language.* Latency, accuracy, and cost must be solved at the same time; optimizing only one kills the use case.* Feeding both sides of the conversation into STT gives models more context and improves accuracy.* Domain-specific accuracy matters far more than general accuracy, and most models still fail in specialized environments.* Industry “context boosting” tricks are hacks that break at scale; native learned context inside STT is the only path forward.* Punctuation and intonation directly shape LLM reasoning, and stripping them for speed creates silent failure modes.* Voice AI is shifting from speech-to-text to full speech understanding, and models that don’t evolve won’t survive.* The future points toward fused audio plus LLM architectures that remove the brittle STT handoff entirely. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In this special edition of the Future of Voice AI series of interviews, we're joined by industry vets to unpack: - How clarity became a measurable KPI for CX quality and trust - How TTEC identified and solved global voice challenges across regions - Real results: customer satisfaction, agent confidence, cost efficiency improvements and moreThis episode’s guests are TTEC’s James Bednar, VP of Innovation and Product, and Biju Pillai, VP of India Operations.As voice remains the most human and high-stakes channel, global contact centers like TTEC face a growing challenge: how to deliver effortless understanding across accents, environments, and expectations.In this live session, TTEC leaders and Krisp’s CEO unpack the business case for clarity — sharing how they transformed challenges into measurable wins, including how they turned fragmented communication into a unified standard across global operations. You’ll hear what worked, what didn’t, and how AI-driven voice clarity has become a core pillar of TTEC’s customer and agent experience strategy.Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways1. Clarity drives measurable ROI.* After Krisp deployment, noise complaints dropped 76%, sales conversions rose 26%, and CSAT improved 8%.* These are not pilot numbers, they came from sustained production environments across thousands of agents.2. Accent conversion unlocks new talent pools.* By eliminating accent barriers in real time, TTEC could hire for skill, not sound.“We don’t want to hire the right accent, we want to hire the right talent,” James said.* This reduced reliance on costly and inconsistent “voice coaching” programs, creating what Pillai called an “always-on coach.”3. 80+ NPS from offshore delivery proves the point.* An India-based program reached 80+ NPS, with language-barrier reports cut in half (2.6% → 1.2%) and experience scores rising from 90.5% to 95.5%.* Each new Accent Conversion model release (v3.5 → v3.7) corresponded to higher NPS, peaking at 85 in September 2025.4. Cost efficiency without quality compromise.* Offshore voice delivery using Krisp achieved ~70% cost savings versus onshore U.S. teams.“Clients that once said India isn’t where you go for voice are rethinking that.”5. Agent wellbeing and empathy improved.* Agents reported lower fatigue, faster understanding, and higher confidence.Biju noted “calls now flow better—agents no longer overcompensate for accent or tone.”* That confidence translated into trust and empathy, making every conversation feel more human.6. Next frontier: real-time translation and pacing intelligence.* With accent conversion now near full maturity, Krisp is launching Accent Conversion v4.0, tackling pacing and accent leakage.* Inbound accent conversion and real-time translation will soon close the loop to help both agents and customers understand each other.This isn’t just a story about cleaner audio. It’s about turning clarity into confidence, confidence into empathy, and empathy into measurable ROI.As James put it: “These use cases just work. They deliver what’s expected, with almost no effort to deploy.” This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Will Bodewes, CEO at Phonely.ai.Will Bodewes is the Co-founder and CEO of Phonely.ai, a Y Combinator–backed startup building conversational phone support powered by AI. A lifelong competitor and creator, he earned a mechanical engineering degree from UNH and launched his first company, Spoke Sound, soon after. Following AI research and travels across Africa, Asia, and the Pacific, Will combined his technical background and curiosity to take on one of tech’s toughest challenges: making AI sound human.Phonely provides AI-powered phone support agents for industries requiring fast, reliable, and human-like AI interactions. Its AI solutions reduce wait times, improve customer experiences, and enable seamless automated conversations.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Voice AI jumped from niche to movement in two years, with young builders driving it.* Reliability at scale beats clever prompts; buyers want systems that just work.* Time-to-value is the moat; months of coding kills deals.* Every AI agent succeeds only if it knows what to say, what to know, and what to do: conversation, context, and action.* Integrations are the choke point; the hard work is plumbing messy CRMs and legacy tools.* Training BPO teams to build on the platform scales better than flying in engineers.* LLMs are the latency bottleneck, so faster tokens = more human conversations.* Groq partnership delivered lower latency and beat big names on some Phonely benchmarks.* “Did the caller detect it wasn’t human?” is a better quality metric than WER.* Phonely claims 100% function-calling accuracy in production, which is what buyers actually feel.* Low ASR confidence should trigger human-like behavior (ask to spell names), not clunky links.* Capturing names, numbers, and addresses is the last-mile blocker; fix this or nothing else matters.* Cascading still wins for business logic; speech-to-speech isn’t reliably deployed in production.* Best near-term wins: customer support with tight FAQs, lead qual, and appointment setting.* Defined outcomes plus A/B testing lets agents match call-center KPIs at 50–70% lower cost.* Enterprise rollout will be gradual (2–3 years) until hallucination fear fades.* The next unlock is LLMs that talk like people while staying fast and precise.* Expect convergence where “voice-to-voice” and cascading blur, but LLMs keep the reasoning core. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is David Yang, Co-Founder Newo.ai.David Yang, Ph.D., is a Silicon Valley–based serial entrepreneur and co-founder of Newo.ai. He previously founded ABBYY, a global leader in AI and content intelligence whose technologies serve over 50 million users and thousands of enterprises in 200 countries. Over his career, Dr. Yang has launched more than a dozen companies, contributed to major advances in AI and workplace technology, and has been recognized by the World Economic Forum as one of the top 100 World Technology Pioneers.Newo.ai is a San Francisco–based AI technology company building human-like AI Agents that transform how businesses operate. Founded by AI entrepreneurs David Yang, Ph.D., and Luba Ovtsinnikova, the team brings a track record of launching more than 10 successful companies whose products are used by over 50 million people in 200 countries. Newo.ai’s mission is to unleash the superpowers of small and medium businesses by giving every entrepreneur an AI teammate that never sleeps, never gets tired, and helps turn the impossible into the inevitable.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Building agents at scale is the real moat; the need is hundreds to thousands of production-ready agents per month, not one-offs.* “Production in minutes” matters more than fancy demos; zero-touch setup wins.* Most SMBs will swap IVRs and voicemail for AI receptionists that actually book and drive revenue.* Websites will become conversational; voice and chat agents will greet, qualify, and convert visitors.* Industry templates (patients vs. guests, dental vs. hotels) let one agent fit ~90–95% of use cases out of the box.* Voice is the hardest and most important channel—latency, interruptions, accents, and noise make it 80% of the problem.* The real production hurdle for AI agents was latency; agents need to “think and talk” at once to feel human.* One bad call in ten kills trust and scalability; parallel “observer” agents that fact-check in real time are needed to prevent hallucinated bookings.* Adoption inflects when AI’s “lead success score” approaches human performance; businesses tolerate errors at human-like rates.* Omnichannel isn’t optional for SMB reception; phones, SMS, live chat, and social DMs all feed bookings.* New industries are lighting up weekly; speed of verticalization is a competitive weapon.* The success metric is parity with humans, not perfection; once the lead success score nears human levels, growth takes off.* The near future is practical and paid; AI receptionists that cost little and return 50x in booked revenue will win long before sci-fi visions do.* Long-term, David sees a chunk of the world’s knowledge work shifts to “non-biological” employees, forcing new ethics and norms.* David predicts that that 300 million of the world’s 1 billion knowledge workers could be AI-based in the future.* Humans and machines are moving toward a hybrid future where biological beings have non-biological implants and vice versa.* Early emotional AI like Morpheus was designed with synthetic “oxytocin” and “dopamine” and even used architecture (moving walls) to mirror emotional states.* Robotic pets and AI systems living alongside humans foreshadow non-biological members of society becoming normal.* AI systems aren’t deterministic, raising the need for new ethical frameworks beyond Asimov’s Three Laws.* Morality and shared values will need to be trained into AI, as decisions often fall into gray areas. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In this special edition of the Future of Voice AI series, welcome leading voices on the state of voice AI in CX:- Nicole Kyle of CMP Research on CX market data and shifting priorities - Kwindla Hultman Kramer of Daily on building and scaling voice AI agents - Brent Stevenson of IntouchCX on AI adoption on the frontlinesFullband 2025 brought together research, technology, and frontline leaders to cut through the hype and show where voice AI is actually working today. Here’s the distilled recap.The State of Voice AI in CXNicole Kyle, Managing Director & Co-Founder of CMP ResearchNicole leads groundbreaking research on customer contact and shared why voice remains essential in CX and how priorities are shifting in an AI-driven era.3 Takeaways:* Voice is still the biggest automation prize because it carries the most volume.* Interest in self-service is high, but adoption lags due to poor experiences.* Leaders are shifting from GenAI hype to use-case deployments and early agentic AI.Stat to remember:Only 3% of customers prefer conversational voice AI for self-service today, driven by quality gaps, not lack of interest.What you can do today:Pick one high-volume voice use case and lift quality: define success, measure completion rate and CSAT, and iterate until adoption rises.Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.The State of Voice AI AgentsKwindla Hultman Kramer, CEO & Co-Founder of DailyKwindla is pioneering real-time AI agents for voice and video. He unpacked unpacked what it takes to build, scale, and deploy AI agents that actually work.3 takeaways:* Enterprises moved from curiosity to concrete agent roadmaps in just 12 months.* Constrained, vertical agents (esp. outbound) are finding product–market fit faster than broad platforms.* 2024 solved plumbing (turn-taking, latency); 2025 is about natural conversations and reliable structured data capture.Stat to remember: One in three enterprises is already in production with AI agents.What you can do today:Choose one constrained workflow (e.g., outbound confirmation calls). Define latency and handoff goals, then launch and tune before scaling.The State of Voice AI in BPOsBrent Stevenson, Chief Experience Officer of IntouchCXBrent offers a frontline view of CX transformation, showing how BPOs adopt AI by fixing workflows and blending automation with human expertise.3 takeaways:* Workflow design and governance are the real blockers, not technology.* BPOs are acting as “AI administrators,” with QA analysts repurposed into agent trainers and prompt engineers.* Agent assist is now table stakes, with translation and accent conversion expanding labor pools and market access.Stat to remember:Agent assist at IntouchCX delivered ~10% AHT reduction, 3–5% CSAT lift, and 20% faster agent ramp.What you can do today:Stand up an “AI QA” function to own prompts, tuning, and bot governance — manage AI like you manage human agents.Q&AThe event was jam-packed, and we couldn’t get to every question live. Here are the ones we missed.1. Nicole, the research shows a lot of shifting priorities. Which do you think will have the biggest long-term impact on how companies invest in voice AI?The need to increase customer adoption of self-service will have the biggest long-term impact on how companies invest in voice AI. Everything depends on the quality of the experience. If voice AI delivers a high-quality interaction, voice becomes the channel with the most to gain—from greater customer adoption of automation to significant cost savings through deflection. But if the solution falls short, it risks damaging the customer experience. It’s a classic case of high risk, high reward.2. Nicole, where do you see the biggest gaps between executive priorities and the technology that’s actually available today?This isn’t a fun answer but honestly, knowledge base management and governance. Good knowledge is the key ingredient to making an AI (conversational, generative or agentic) function properly. And it’s hard for most customer contact and CX organizations to manage right now. They’re looking for AI solutions that can proactively add to and audit the knowledge base, but there’s a gap in the market right now. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Neil Hammerton, CEO & Co-Founder at Natterbox.Neil Hammerton is CEO of Natterbox. Neil co-founded the UK telecoms disrupter in 2010 with the aim of transforming the business telephony experience of firms and their customers. Today, Natterbox works with over 250 businesses around the world to improve data integration through CRM within Salesforce. Natterbox enables them to put the telephone at the heart of their customer services strategy and guarantee high standards across their customer services experience.Natterbox is the AI-powered contact center platform redefining how Salesforce-first businesses connect with customers. Drawing on 15+ years of contact center expertise, we help leading organizations to effortlessly incorporate AI into their contact center operations and seamlessly blend AI with their contact center workforce to deliver optimal customer experiences.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Deep CRM-native integration beats bolt-ons because it keeps full context across every call and channel.* Real-time summaries turn each call into structured data the next agent or bot can use on the spot.* Recording and transcribing every call is the foundation for smart routing, compliance, and coaching.* AI should own the simple, high-volume tasks while humans handle exceptions and emotion.* The biggest CX drag is “tell me your story again”; carry context forward and it disappears.* Wait times drop fastest when AI does first response and triage before a human ever picks up.* Let bots update Salesforce during the call so agents don’t burn time on after-call work.* Building your own telephony stack gives control over quality, latency, and feature pace.* Measure success by resolution and customer effort, not just bot containment or call deflection.* Most customers won’t dig through a website; they call—meet them with fast, guided answers.* AI without a clean handoff path back to humans will frustrate users and spike churn.* Automate the top three intents end-to-end first, prove value, then expand the surface area.* Use history plus live intent to route to the right bot or human in seconds, not minutes.* Keep transcripts and actions inside Salesforce so data is secure, searchable, and actionable.* Voice is still the highest-stakes channel; small gains here move CSAT, FCR, and churn in a big way.* Offload repetitive calls to AI and agents get happier, faster, and more effective.* “AI first, human-in-the-loop” is the practical path for the next 12–24 months—not full automation.* The win isn’t flashy AI; it’s consistent outcomes: faster answers, fewer transfers, better follow-through. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guests are Anshul Shrivastava, Co-Founder and CEO, and Kumar Saurav, Co-Founder and CTO, at Vodex.ai.Vodex specializes in Generative AI-powered voice agents that facilitate natural, humanlike conversations with customers. These virtual agents manage the initial phases of customer interactions, offering businesses a scalable and efficient way to handle inbound and outbound sales and collections calls. By personalizing conversations and providing real-time insights, Vodex helps businesses improve engagement and streamline processes.Anshul Shrivastava is the Founder and CEO of Vodex.ai, with 12+ years in the IT industry and a strong focus on AI innovation. He leads Vodex.ai in building global AI solutions, aiming to drive growth and deliver real impact for clients. Anshul views technology as a catalyst for progress and is passionate about shaping the future of AI.Kumar Saurav is the Co-Founder and CTO of Vodex.ai, where he drives the development of generative AI solutions for business. With 13+ years across IT, IoT, Robotics, and AI, he brings both technical depth and business insight to solving client challenges. At Vodex.ai, he focuses on AI-powered outbound call solutions that boost sales, service, and marketing performance, while sharing his expertise through writing and research.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Voice AI still hasn’t had its ChatGPT moment because people hate talking to bots that feel slow or robotic.* Latency is the deal breaker — anything slower than 300ms breaks the illusion of real conversation.* Cascading pipelines lose tone, emotion, and context, making bots sound flat and unreliable.* Speech-to-speech models are the real unlock, combining speed with emotional nuance.* Most voice AI agents are stitched together from ASR, LLM, TTS, and telco layers.* Vodex positions itself as the “Stripe of voice AI” with simple plug-and-play APIs.* Vertical focus matters, and collections is their strongest domain with strict FDCPA compliance.* Naturalness moves revenue, with one Arabic deployment lifting recovery from 45% to 81% in seven days.* Naturalness is not a “nice to have” — it directly drives revenue and customer trust.* The bar is rising fast; in two years robotic-but-functional bots will be unacceptable.* Proven sweet spots for voice AI right now: lead qualification, debt collection, healthcare scheduling, and follow-ups.* Vodex’s origin story shows the shift from slow custom builds to no-code, plug-and-play bots for non-technical users.* Context engineering and AI-on-AI testing are how they handle edge cases and reliability gaps.* The future of voice will run on small, task-specific speech models built for speed and accuracy.* Gen Z decision makers will push companies to embrace talking to systems instead of clicking around apps.* Vodex rejects cold-call spam, betting that contextual, consent-based conversations will define the industry.* Soon, every company will be expected to have a natural voice agent the same way every company is expected to have a website. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Peter Ryan, President and Principal Analyst at Ryan Strategic Advisory.Peter Ryan is recognized as one of the world’s leading experts in CX and BPO. Throughout his career, Peter has advised CX outsourcers, contact center clients, national governments, and industry associations on strategic matters like vertical market penetration, service delivery, best practices in technology deployment, and offshore positioning.Ryan Strategic Advisory provides market insight, brand development initiatives, and actionable data for organizations in the customer experience services ecosystem. With two decades of experience, Ryan Strategic Advisory supports outsourcing operators, technology providers, industry associations, and economic development agencies.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* The hype cycle around AI has made it hard for CX leaders to separate real progress from inflated promises.* Adoption of voice AI is moving from concept to mainstream, driven by accuracy, latency improvements, and reliability.* Customers care most about issue resolution, not whether the agent sounds robotic or perfectly human.* One bad phone experience, often caused by language or accent misunderstandings, can permanently lose a customer.* Nearly half of surveyed enterprises are already using AI-powered voice translation, showing trust in its growing value.* About a quarter are experimenting with or adopting AI accent conversion, a big leap from just a few years ago.* Accent technology is not just for customers; it reduces agent stress and helps retain frontline workers.* Better agent retention directly lowers costs tied to recruiting, training, and high attrition.* Frontline agents are often more enthusiastic about accent technology than executives, because it eases real pain in daily calls.* CX leaders see accent and translation tools as a way to improve loyalty by making communication effortless across borders.* Latency in AI responses is no longer the barrier it once was—customers tolerate small delays if accuracy is high.* The biggest risk with AI in CX is overpromising; pragmatic, real-world use cases drive adoption faster than hype.* Failed AI deployments are often rolled back, especially with voice bots that don’t meet expectations.* Real-world case studies are becoming essential for buyers to justify investments in a tight economic climate.* CX voice AI adoption has followed a clear path: noise cancellation first, then accent tools, now translation at scale.* The next wave of adoption depends on showing measurable business outcomes rather than futuristic demos.* AI in CX today is compared to Pentium processors in the 90s: a turning point that accelerates everything once it matures.* Companies that promise realistically and deliver consistently will win long-term trust in a crowded AI market.* The real test of AI in CX isn’t novelty—it’s whether it helps customers resolve issues faster, cheaper, and with less friction.Check out the last week’s article to dive deeper into the data discussed in this episode. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Assaf Asbag, Chief Technology and Product Officer at aiOla.Assaf Asbag is the CPTO at aiOla, leading AI-driven product innovation and enterprise solutions. He previously served as VP of AI at Playtika, where he built the AI division into a key growth engine. Assaf’s background includes advanced algorithm work at Applied Materials and leadership across engineering and data science teams. He holds B.Sc. and M.Sc. degrees in Electrical and Computer Engineering with a focus on machine learning from Ben-Gurion University, making him a recognized expert in AI and technology strategy.aiOla's patented models and technology supports over 100 languages and discerns jargon, abbreviations, and acronyms, demonstrating a low error rate even in noisy environments. aiOla's purpose-built technology converts manual processes in critical industries into data-driven, paperless, AI-powered workflows through cutting-edge speech recognition.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Turning spoken language into structured data in noisy, multilingual, and jargon-heavy environments is the real differentiator for enterprise voice AI.* Standard ASR models fail in frontline industries due to heavy accents, domain-specific vocabulary, and constant background noise.* Zero-shot keyword spotting from large jargon lists without fine-tuning can drastically cut setup time for specialized speech recognition.* Building proprietary, noise-heavy training datasets is essential for robust ASR performance in the real world.* Synthetic data generation that blends realistic noise with text-to-speech can cheaply scale model adaptation for niche environments.* Real-time processing is critical to making voice the primary human–technology interface, especially for operational workflows.* Voice AI has massive untapped potential among the world’s billion-plus frontline workers, far beyond current call center focus.* Incomplete or missing documentation is a hidden cost that voice-first tools can solve by capturing richer, structured information on the spot.* Effective enterprise AI solutions often require both a core product and flexible integration layers (SDK, API, or full app).* Trustworthy AI for voice will require guardrails, watermarking, bias detection, and context-aware filtering.* The next leap in conversational AI will be personalized, real-time adaptive systems rather than today’s generic emotion mimicking.* Designing for multimodal interaction (voice, text, UI) will be as important as model accuracy for user adoption.* AI revolutions historically create more jobs than they displace, but require new roles in monitoring, reliability, and context engineering.* Future speech AI should emulate human listening: diagnosing issues, correcting in real-time, and adapting based on cues like pace, volume, and accent. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Jack Piunti, GTM Lead for Communications at ElevenLabs.Jack Piunti is the GTM lead for Communications at ElevenLabs, where he oversees go-to-market strategy across CPaaS, CCaaS, UCaaS, and customer experience. With a strong background in consultative technology partnerships and startup growth, Jack brings deep expertise in AI-driven communications. Prior to ElevenLabs, he spent six years at Twilio, helping shape enterprise adoption of real-time voice technologies. He is passionate about the future of connected applications and the role of AI in transforming how we communicate.ElevenLabs is a voice AI company offering ultra-realistic text-to-speech, speech-to-text, voice cloning, multilingual dubbing, and conversational AI tools. Founded in 2022, it enables creators and developers to build voice apps and generate lifelike, emotionally rich speech in 70+ languages. Its latest models support expressive cues and multi-speaker dialogue. Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Most AI failures in conversation don't come from the language model, but from inaccurate speech-to-text at the start.* Bad transcription of critical details like names or codes breaks the entire user experience and can’t easily be recovered.* Accurate speech-to-text is now a make-or-break factor for building reliable AI agents.* Voice will soon replace typing as the main way humans interact with machines because it's more natural and efficient.* Enterprises don’t want to stitch together multiple AI vendors, they want end-to-end platforms that simplify the stack and reduce latency.* Demos often look impressive, but very few companies can scale real-time voice tech reliably in production environments.* AI voice agents that sound expressive aren't enough — turn-taking and accuracy are still bigger challenges.* Most companies ignore accessibility in AI, but modeling things like stuttering actually improves agent behavior.* Streaming speech and voice models will unlock more lifelike, responsive AI agents — and it’s coming fast.* Audio AI needs deep expertise beyond AI, including sound engineering and context-aware modeling of human speech.* There’s a growing trend of AI companies going beyond voice to control the full audio experience, including music and sound effects.* The way voice models are trained is fundamentally different from language models and requires much cleaner training data.* Many agentic AI builders today are forced to cobble together solutions from different vendors, which creates delay and complexity.* True real-time voice AI must handle language switching, emotional cues, and speech disfluencies automatically to feel natural. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Bryce Cressy, VP of Strategic Solutions at Nutun. Bryce Cressy is the VP of Strategic Solutions at Nutun, where he leads innovation, AI integration, and process optimization across global CX and collections programs. With deep expertise in partnerships and outsourcing, he helps clients futureproof their contact center operations by combining human talent with transformative technology. Based in South Africa, Bryce is a vocal advocate for the region’s rise as a high-skill BPO hub, and works closely with enterprise leaders in the US and UK to design tailored, tech-forward customer experiences.Nutun is a global BPO headquartered in South Africa, specializing in customer experience and debt collection services for clients in the US, UK, Australia, and beyond. With 30 years of industry experience and a strong foundation in collections, Nutun blends skilled human talent with cutting-edge AI to deliver high-impact, scalable solutions. Nutun is redefining offshore CX by combining local expertise, robust infrastructure, and a commitment to continuous innovation.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* AI only works when solving specific, targeted problems; using it as a blanket solution guarantees failure.* The term "agentic AI" is being overused without a shared definition, creating more confusion than clarity.* South Africa's time zone, infrastructure, educated talent pool, and English fluency give it a global CX advantage.* Contact center jobs are now aspirational in South Africa, offering career paths from agent to executive.* Voice still dominates support channels, but without Voice AI, BPOs risk becoming obsolete.* Escalation design is the most critical aspect of Voice AI adoption; bad handoffs will break customer trust.* Voice bots should never trap customers in AI-only loops without access to a human.* Companies afraid of AI hallucinations start with agent-assist tools, not bots—it's a low-risk entry point.* Clear audio is make-or-break for AI accuracy, especially in noisy environments like collections.* IVR menus are outdated; conversational routing with AI voice agents is the new standard.* Smart BPOs are flipping the model, letting humans hand off to bots for routine tasks, not the other way around.* Voice AI isn't just a cost play, it's a CX differentiator that drives loyalty and efficiency.* Many vendors sound the same; what matters is whether their tech solves a real, measurable problem.* AI voice agents won't kill human support, it will triage it—handling volume while preserving empathy.* Customers need to know a human is always available or they'll lose confidence in the brand.* The future of BPOs lies in combining process consulting with selective, surgical AI integration. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next few years?This episode’s guest is Sharang Sharma, Founder and Vice President at Everest Group.Sharang Sharma is a Vice President at Everest Group, where he leads research and advisory in Business Process Services with a focus on customer experience management. He works closely with CX leaders to help them navigate digital transformation, AI adoption, sourcing, and operational strategy. Sharang brings deep insight into how technologies like voice AI, accent neutralization, and translation are reshaping global support models.Everest Group is a global research and advisory firm headquartered in Dallas. The company provides deep insights into business process services, technology, and customer experience, helping organizations navigate innovation and operational transformation across industries.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* AI is now setting the standard for customer experience, not just helping it.* Voice is becoming a top channel again because of real-time AI improvements.* Accent and language tools are changing what it means to deliver global CX.* Companies used to ignore voice, but AI is making it faster, smarter, and easier to scale.* CX is where AI is being tested the most because it needs to be accurate and cost-effective at large scale.* After years of shifting away from voice, AI is bringing it back as a preferred support option.* Voice AI helps companies hire globally by making accents less of a barrier.* Translation AI is still early and not as reliable yet, like accent tech was a year ago.* Accuracy is a bigger issue than speed when it comes to using translation AI in real-time calls.* Companies shouldn’t expect perfect AI, but should ask if it’s better than the other options.* What counts as good CX now is shifting toward clarity, empathy, and smarter service.* Working with AI to support humans is more reliable right now than using bots alone.* People often think CX work is dull, but it depends on human connection and emotion.* Small improvements from AI are adding up to major gains in customer experience.* AI is growing in CX not just because of the tech, but because it opens cheaper and wider hiring options. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Justin Robbins, Founder and Principal Analyst at Metric Sherpa. Justin Robbins is the founder of Metric Sherpa, an independent analyst firm helping CX leaders, investors, and solution providers cut through the noise and make confident decisions. With a career spanning both the front lines and boardrooms of CX, Justin brings clarity to a fast-changing market. He’s built frontline support teams, advised global brands, and knows firsthand what drives real impact—from the operations floor to executive strategy. Through his research, insights, and content, Justin equips businesses not just with information, but with the direction they need to act.Metric Sherpa is an analyst firm built on the belief that CX insights should lead to action. The firm was born out of real-world experience in customer operations and a need for clarity in a crowded, fast-moving space. Metric Sherpa helps solution providers, investors, and business leaders find the meaningful signal in the noise, translating market trends into decisions that matter.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Metric Sherpa focuses on turning insights into clear decisions and actions.* Most CX data isn’t useful until it’s made actionable for business decisions.* There’s a major disconnect between what leaders think customers want and what customers actually experience.* AI deployment is exposing how broken and outdated most companies’ knowledge bases really are.* Critical knowledge still lives in employees’ heads, with no scalable way to capture or share it.* Poor knowledge directly causes AI agents to fail and lose companies money.* AI is finally forcing organizations to treat knowledge management as a core function, not an afterthought.* Companies rely on a few internal champions to maintain knowledge, which collapses when they leave.* Automatically generating knowledge from conversations is possible, but requires human oversight to ensure accuracy.* AI can accelerate documentation by drafting knowledge articles that humans refine.* Many leaders claim they’ll reinvest AI savings, but higher-ups often prioritize headcount cuts instead.* Organizational workload won’t decrease with AI; new complexity and tasks will quickly fill the gap.* More efficient support channels could increase usage, driving higher inbound volumes.* Most contact centers are understaffed due to poor visibility into agent productivity and shrinkage.* Frontline agents are starting to take on new roles coaching and guiding AI systems in real time. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is James Bednar, VP of Product and Innovation at TTEC. James Bednar is VP of Product and Innovation at TTEC, where he leads strategy at the intersection of CX operations and emerging tech. With a background in cognitive science and over 20 years in the industry, he brings a human-centered lens to AI innovation, helping global brands scale meaningful customer experiences.TTEC is a leading global CX technology and services innovator for AI-enabled digital CX solutions. Serving iconic and disruptive brands, TTEC's outcome-based solutions span the entire enterprise, touch every virtual interaction channel, and improve each step of the customer journey. Founded in 1982, TTEC’s employees operate on six continents and bring technology and humanity together to deliver happy customers and differentiated business results.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Over the past few years, the contact center industry has shifted from people-based services to tech-dominated solutions.* Empathy was once the top CX priority but has taken a backseat to speed and automation in recent years, especially post-COVID.* Post-COVID impatience has changed customer expectations—people now prioritize fast resolution over emotional connection.* AI may never feel empathy like a human, but it can simulate enough of it to meet rising expectations for speed.* Younger generations are building emotional trust with AI, even consulting it for major life decisions more than their parents.* The idea that empathy is a uniquely human trait may no longer hold up as AI gets more advanced and socially accepted.* Most contact center training still forces fake empathy through scripts, which can backfire and hurt customer trust.* Disingenuous empathy can be worse than showing no empathy at all.* Real customer satisfaction comes more from issue resolution than emotional expression alone.* AI may be better than humans at consistently identifying emotional cues in conversations, even if it’s not perfect yet.* Traditional QA processes still rely on subjective human judgments of empathy, which lack consistency and scalability.* Trust plays a major role in perceived empathy—users trust AI more when it provides helpful and consistent answers.* The industry may need new KPIs to measure the genuineness or effectiveness of empathy, especially in AI-led interactions.* The balance between empathy and speed is evolving, and AI might soon outperform humans in delivering both at scale.* TTEC breaks down empathy’s ROI in the contact center in the age of AI in their latest report: “Is Empathy Overrated?” This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This CCW episode features guest Jordan Zivoder, Quantitative Research Lead at Customer Management Practice (CMP Research). Jordan Zivoder has 10 years of experience in Market Research and Voice of the Customer leads quantitative research and analysis for CMP Research, Customer Management Practice’s dedicated independent insights and research product. With a primary focus on empowering executives to leverage data for data-driven decisions, Jordan combines expertise in survey research with machine learning to deliver unparalleled understanding of the customer and employee experience.CMP Research delivers unlimited advisory support, diagnostic tools, and data-driven insights to help customer contact & CX executives optimize experience, technology, and operations, while enabling solution providers with go-to-market strategies and customer insights—all powered by the organization behind Customer Contact Week.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Rising cost pressures are shifting priorities toward automation and self-service instead of hiring, changing how leaders approach customer support.* AI is helping agents do better work faster. Companies can boost performance without replacing people.* One bad self-service or bot experience can damage customer trust and stall long-term adoption.* Even as AI gets smarter, customers still expect clear access to a human—over-automation risks breaking trust.* Leaders and agents have different views on what matters most. Closing that gap is key to strong performance and retention.* Executives overestimate the impact of culture while agents care more about good managers, flexibility, and career growth.* Internal tools like Agent Assist are a safer way to test AI performance and reduce risk before deploying customer-facing automation.* AI only works well if the data behind it is accurate and up to date. Bad information leads to poor results and failed launches.* Contact centers are rich in conversation data, but few use it well. Those who miss this opportunity fall behind.* The best teams feed call data into AI tools to fill knowledge gaps and continuously improve performance.* New AI tools can detect missing knowledge and automatically update content, creating a self-improving feedback loop.* AI adoption forces companies to treat knowledge management as a core priority, not an afterthought.* AI’s value is not just in automating conversations but in creating systems that help both bots and humans improve over time. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Jonathan Keane, Co-Founder and CEO at CustomerHD. Jonathan Keane is the Co-Founder and CEO of CustomerHD. Over the past 15 years, he has led customer service operations for both high-growth startups and established brands, always focused on using service as a competitive advantage.His perspective shifted after years of visiting call centers, starting in 2007, where he saw firsthand the challenges frontline teams faced and the lack of respect too often shown to them. That experience sparked a clear mission: to build a better model for outsourced support.At CustomerHD, Jonathan is creating a company where people are genuinely valued and culture comes first. He’s leading a team that’s proving great customer service starts with taking care of the people who deliver it.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* CustomerHD is a customer support outsourcing company that combines human agents with AI tools to deliver scalable, high-quality customer service, primarily from English-fluent teams in Belize.* Voice is still the hardest form of customer interaction, requiring real-time clarity, empathy, and precision.* Noise cancellation has reached near-perfection and is now foundational in voice customer support.* Voice translation is still immature, especially outside common language pairs, and is expensive with current human-based solutions.* Real-time transcription is the biggest bottleneck in AI translation pipelines, often corrupting the accuracy of entire conversations.* Code-switching (mixing languages mid-sentence) is a major challenge for current transcription systems.* Simpler, faster-to-deploy AI tools are crucial for small and mid-sized support teams without enterprise budgets.* The technology that works most reliably today is technology that helps humans. * AI is turning customer service agents into more advanced roles where they review data and help improve how the AI works.* AI is creating new career paths that involve tech skills and system optimization.* Human agents will evolve into AI trainers and QA analysts, tuning and improving systems with real-world customer insights.* Prompt engineering is becoming a critical new skill for support agents transitioning into tech-facing roles.* Multilingual AI could unlock new markets, but latency and technical jargon comprehension are still blockers.* Customer education around AI limitations (like translation delays) could improve satisfaction and acceptance.* Auto QA driven by GenAI transforms quality assurance from manual listening to data-driven coaching.* The AI-human relationship is flipping: agents not only use AI but actively help optimize it.* Belize offers unique advantages for voice support with its English-first, multicultural workforce.* AI will always need human input for edge cases, emotional nuance, and complex product knowledge.* The shift toward AI-supported service is reshaping customer service into a high-skill, technical career path.* Jonathan sees AI not as a threat, but as a massive upskilling opportunity for his workforce. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Francisco Izaguirre, Engineering Lead at 11x. Francisco Izaguirre is the Engineering Lead at 11x, where he’s helping redefine how voice AI powers modern revenue teams. With a background spanning backend systems, ML infrastructure, and conversational AI, Francisco currently leads development on Julian—11x’s real-time voice agent built to autonomously handle sales calls, qualify leads, and book demos. His work sits at the intersection of latency optimization, emotional intelligence in AI, and scalable agent design, making him one of the most hands-on builders in the emerging voice AI space.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* 11x is turning voice AI into a real revenue driver. Their agent, Julian isn’t just automating calls, it’s reshaping how businesses scale outbound sales with speed, empathy, and autonomy that rivals human reps.* There’s nothing more agentic than a phone call—full autonomy, real-time decisions, no undo button.* Speed to lead isn’t just a sales metric, it’s a core design principle for AI voice agents like Julian.* 300ms end-to-end latency is now the bar for natural-feeling AI phone calls, including STT, LLM, RAG, and TTS.* Traditional RAG pipelines break down in real-time voice—fetching data across the wire is too slow and too obvious.* Masking latency through conversational techniques like “give me a second” is a practical design pattern, not a hack.* Twilio and similar telephony providers are becoming the new latency bottleneck as expectations move to sub-300ms performance.* ASR still struggles with critical details like names, emails, and numbers—especially across accents—which can derail entire calls.* Pronunciation isn’t solved; getting someone’s name or company wrong breaks the trust instantly.* Using phonetic inputs like IPA to improve TTS accuracy shows measurable emotional and experiential gains in customer interactions.* Good conversation isn’t about perfection, it’s about imperfection that feels human and emotionally attuned.* The best agents will sound like your imperfect but empathetic friend.* “Turn-taking” is still a massive challenge. Detection is easy, inference is hard, and both introduce their own bugs.* Backchanneling (knowing when to respond, not just when to talk) is an unsolved frontier in conversational AI.* Emotional intelligence isn’t a bonus for voice agents—it’s required for use cases like medical, hospitality, and SMB outreach.* Agents must dynamically adapt empathy patterns depending on user stress, urgency, and tone—or risk sounding tone-deaf.* Emerging techniques like emotional tagging in TTS are promising but still lack scalable evaluation methods.* Breaking a single agent into a mesh of specialized sub-agents may outperform monolithic models in complex conversations.* Self-learning pipelines are still aspirational; manual tuning can’t keep up with the pace or volume of voice interactions.* Development speed can’t lag behind live conversations; automation loops with human-in-the-loop review are essential.* Passing the Turing test in voice won’t just be about tone or latency, it will require recursive emotional and contextual depth. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Rebecca Greene, Co-Founder and CTO at Regal AI. Rebecca Greene is the Co-Founder and CTO of Regal, the AI Agent Platform for enterprise contact centers. Prior to Regal, Rebecca was the Chief Product Officer at Handy (acquired by Angi) and started her tech career at Amazon. She has an MBA from Harvard and a BA from Columbia.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Regal AI builds voice agents that combine real-time AI with human-level personalization to automate high-value conversations.* Voice AI works best in industries where conversations drive value, not just cost savings.* Contact center use cases that involve emotion or complexity are more compelling for AI agents than low-stakes retail.* AI agents can outperform humans in uncomfortable tasks because they lack shame or fatigue.* Inbound support is being redefined by voice agents that actually understand intent and take action.* The real threat to IVRs isn’t better UX, it’s full obsolescence via intelligent voice agents.* Long conversations, emotional complaints, and group calls are still weak spots for AI agents.* Customers over-engineer guardrails out of fear of hallucination, then wonder why performance lags human agents.* Top-performing humans agents routinely go off-script, yet companies expect AI agents to stick to it. This creates false performance comparisons.* Real innovation starts by training agents based on top human reps, not static call scripts.* Every stage of the AI agent lifecycle—especially post-deployment—still has unsolved product gaps holding back performance and adoption.* Simulated testing is unreliable because it assumes “perfect users” but real customers are messy, emotional, and long-winded.* Agent testing and improvement today is more product design than engineering—continuous iteration is the norm.* Most enterprise customers aren’t equipped to build or manage AI agents without hands-on help, even if they are tech savvy. * Prompts for production agents are wildly different from ChatGPT-style usage and most contact center leaders don’t realize the complexity.* Regal’s approach includes embedding engineers to co-build with customers, showing that hands-on support is still a differentiator in Voice AI.* Rebecca predicts 90% of contact center voice interactions will be AI-led within five years, unless regulators step in.* AI agents won’t just match human performance, they’ll eventually surpass it with personalization, memory, and scalability.* Future agents won’t just optimize for the “average” customer—they’ll dynamically shift tone and pacing based on individual behavior.* Voice interfaces will move from the call center to the product itself and embed real-time conversation into apps as a core feature. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Ruma Nair, Principal Product Manager for AI and Customer Engagement at Twilio. Ruma Nair is a Principal Product Manager at Twilio, leading AI-driven innovation on Twilio Flex, the company’s programmable contact center platform. She specializes in building scalable generative and predictive AI solutions that solve both current and emerging user pain points. With deep expertise in product strategy, discovery, and user behavior analysis, Ruma blends data, customer insight, and technical collaboration to deliver impactful product experiences.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Ruma leads AI product development for Twilio Flex, focusing on tools like Agent Copilot that improve agent onboarding, efficiency, and real-time support.* AI that doesn’t make agents faster, smarter, and more human in real time is already obsolete.* Agent onboarding takes 7–8 months, but many agents leave before that, making AI a critical tool for speeding up training and ensuring consistency.* Agent onboarding can take 7–8 months so your best bet is AI that cuts onboarding time as close to zero as possible.* Contact centers that don't automate after-call work are wasting their best chance to turn customer conversations into business strategy.* Real-time agent assist is the new frontier—it's the difference between good agents and great ones.* Static FAQs are dead; the future is live, contextual copilots that think alongside your agents mid-conversation.* Cutting onboarding time by 40–50% isn’t a nice-to-have—it’s the only way BPOs and contact centers survive today’s churn rates.* Real success in CX isn't just faster training—it's balancing speed with customer sentiment so NPS and CSAT don’t collapse.* Measuring customer sentiment only at the start of a call is a mistake—outcome sentiment at the end is what actually matters.* Real-time agent assist needs to be invisible and perfectly timed—bad UX will get shut off faster than you can deploy it.* Twilio's modular approach isn't just engineering pride—flexible AI platforms will beat one-size-fits-all CCaaS every time.* The biggest gap in CX tech isn’t features—it’s connecting the dots between great point solutions.* Most buyers aren’t confused by AI—they’re paralyzed by choice and burned out by failed pilots.* Jumping from shiny tool to shiny tool without a clear North Star is why so many AI initiatives die on the vine.* AI isn’t always the answer—sometimes better UX beats any model.* In this market, the biggest risk is not building fast—it’s building on the wrong architecture.* Betting on a single AI model vendor is a trap—future-proofing means staying model-agnostic and ready to swap.* If you don't involve legal, procurement, and security from day one, your favorite AI pilot is probably already dead.* The graveyard of AI projects is full of great POCs that never survived the security and compliance gauntlet.* Winning in AI isn’t just running pilots—it's productionizing fast while clearing blockers early.* In the new AI era, flexibility beats first-mover advantage every time. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?This episode’s guest is Wayne Butterfield, Partner - AI, Automation & Contact Center Transformation at ISG.For over 15 years, Wayne has been at the forefront of AI and Intelligent Automation, pioneering solutions, taking calculated risks, and guiding businesses through digital transformation. Wayne is the Global lead for Contact Center Transformation and is responsible for all CC deployments across ISG. He works with clients in many forms, from AI Usage policy creation to Transformational Sourcing. He is heavily involved in Contact Center Research and partner eco-systems ensuring he is always up to date on the latest advancements in the Contact Center space. He’s a globally recognized thought leader and keynote speaker on all things Contact Center AI & Automation, and host of the award-winning Bots & Beyond Podcast.Recap VideoThanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.Takeaways* Partner at ISG Automation, Wayne leads strategy on AI, automation, and contact center transformation, advising on billion-dollar outsourcing deals in an AI-first world.* Many contact centers are built around people, not tech, and aren’t set up to innovate or automate well.* Clients often just want guaranteed headcount at a set cost—not better outcomes or smarter systems.* The real disruptors may be tech companies entering the contact center space from the outside, not legacy providers.* The old BPO value prop, “cheaper, not better” is being replaced by deep industry expertise and smart tech deployment.* Top BPOs now know industry processes better than the companies they serve.* Real-time translation and accent smoothing could make language-based outsourcing obsolete.* With voice AI handling language barriers, companies won't need 90+ service locations worldwide.* What Wayne calls "BPO 4.0" is coming: tech-first, insight-driven, and built for scale.* Easier service means more customer calls, not fewer.* The real volume shift won’t be from more complaints, but from proactive customer service initiated by brands.* Most companies still treat customer support as a cost center, not a chance to build loyalty or insight.* Support calls are the only real customer touchpoint—and they’re ignored.* Every call is packed with insight, but most of it’s wasted.* AI can finally turn call data into business intelligence, if companies are willing to invest in it.* You can’t expect transformation from a service provider if you block them from accessing your systems and tools.* If enterprises keep gatekeeping their tech stack, they’ll still be talking about the same problems 10 years from now.* Most service provider contracts disincentivize automation because it cuts into their billable hours and revenue.* Outcome-based pricing models—with guaranteed savings over time—are the future of BPO contracting, not hourly rates.* A five-to-ten year shift is coming where service delivery becomes tech-first, and only those who transform now will survive. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai