DiscoverOn-Call Nightmares Podcast
On-Call Nightmares Podcast
Claim Ownership

On-Call Nightmares Podcast

Author: Jay Gordon

Subscribed: 54Played: 744


Being on-call in a tech team can lead to some interesting stories. On this podcast we'll talk to a variety of people from the world of technology, discuss their experiences in on-call and find out some nightmares they survived.

Hosted by Jay Gordon - Twitter @jaydestro
48 Episodes
Hey friends, it's been a while. I haven't been on-call, but I have been working on meeting tons of new people for new content for this podcast. I can't do it alone though. Would you like to be on the podcast? Reach out! Twitter: Email: The commitment for your story is under 35 minutes and you'll have a lasting testimony of your experience on-call.
Well 2019 is just about done, that means one more podcast. This time I break format a bit and welcome on Corey Quinn. Corey and I take a look at how he founded the company and how they help people save money on their AWS bills. Then Corey and I take a dive into some of the topics that impacted the cloud in 2019. A fun conversation to end 2019! Corey is the Cloud Economist at The Duckbill Group. Corey specializes in helping companies improve their AWS bills by making them smaller and less horrifying; hosts the Screaming in the Cloud and AWS Morning Brief podcasts; and curates Last Week in AWS, a weekly newsletter summarizing the latest in AWS news, blogs, and tools, sprinkled with snark.
It's the One Year Anniversary of On-Call Nightmares. When I set out to start this podcast, there were a few people on a list that i just felt I needed to speak to. I finally checked off the first name I had on the list. Episode 45 is a conversation with Google Principal Developer Advocate, Kelsey Hightower. Kelsey Hightower is a Technologist working at Google while learning in public.
This week I chat with Silvia Botros also known as the @dbsmasher from Twitter. I learn about her experiences on-call for databases, motherhood and an affinity for breaking things. An awesome conversation with an incredible person. Silvia Botros is a Sr Principal Engineer at Twilio. She focuses on ways to break databases but is also talented at finding bugs in all your software. Whether she helped build it or not. When she is not helping Twilio Sendgrid send billions of emails a day, she is busy training her little replicas on also breaking computers and trolling her friends on Twitter.
One of the best parts of attending DOES 2019 in Las Vegas was meeting so many of the leaders and innovators from the world of DevOps. Damon Edwards's work is extremely well known in the DevOps field and I was lucky enough to discuss his history during this interview. Damon Edwards is a Co-Founder of Rundeck Inc., the makers of Rundeck, the popular open source Operations Management Platform. Damon has spent over 15 years working with both the technology and business ends of IT Operations and is noted for being a leader in porting Lean and cutting-edge DevOps techniques to large-scale enterprise organizations. Damon is a frequent conference speaker and writer who focuses on DevOps, SRE, and Operations improvement topics. Damon is active in the international DevOps community, a co-host of the DevOps Cafe podcast, and a content chair for Gene Kim’s DevOps Enterprise Summit.
The number 42 has a huge meaning for baseball fans. Jackie Robinson wore 42, Mariano Rivera wore 42 and now one of the greatest in DevOps, John Willis wears the On-Call Nightmares podcast episode #42! Learn from John's past, his present and his future at Red Hat. We got together at the 2019 DevOps Enterprise Summit in Las Vegas to chat about all things DevOps and a lil Yankees baseball (not much). By far one of the most important episodes of the podcast yet. John Willis has worked in the IT management industry for more than 35 years. Currently he is part of Red Hat's Global Transformation Office which will be focused on accelerating our customers digital visions while bringing holistic change across their technological AND social systems. He was formerly Director of Ecosystem Development at Docker. Prior to Docker, Willis was the VP of Solutions for Socketplane (sold to Docker) and Enstratius (sold to Dell). Prior to to Socketplane and Enstratius, Willis was the VP of Training and Services at Opscode, where he formalized the training, evangelism, and professional services functions at the firm. Willis also founded Gulf Breeze Software, an award-winning IBM business partner, which specializes in deploying Tivoli technology for the enterprise. Willis has authored six IBM Redbooks on enterprise systems management and was the founder and chief architect at Chain Bridge Systems. Beyond the Phoenix Project - Audiobook Maslach Burnout Inventory -
On-Call Nightmares returns to talk to the man from Texas who represents Big Blue, JJ Asghar. JJ and I discuss his start as a 15-year-old in technology and how on-call has morphed over the years. JJ works at IBM on the IBM cloud as a Developer Advocate. He’s focusing on the IBM Kubernetes Service trying to make companies and users have a successful on boarding to the Cloud Native ecosystem. He lives and grew up in Austin, Texas. He enjoys a good strong stout, hoppy IPA, and some team building Artemis, madding Dwarf Fortress, Rimworld, or Factorio. He’s a member of the Church of Emacs, though jumps into Vim on remote machines. He usually chooses Ubuntu over CentOS, but secretly wants FreeBSD everywhere. He’s always trying to become a better Ruby developer, but experiments with Go, Python, and only when he has to, Node. A father and husband, if he’s not trying to automate his job away he’s always trying to convince his daughters to “be button makers not button pushers.
A big milestone, episode 40! This week I speak with Netflix SRE Ryan Kitchen about birds, DR and movies! Ryan Kitchens has been in a variety of positions in software over the past ten years allowing him to experience the good and the bad, the amazing and the bizarre. As an SRE with a film degree, he currently works at Netflix on the CORE team, focused on ensuring availability. The background of the team spans incident management and analysis, resilience engineering, and human factors & systems safety.
This week I speak with Dan Bentley of! Dan is a software engineer who's currently fixing microservice development as CEO of Tilt ( ). Before that, he was at Google for 11 years and then Twitter, working on tools for devs and tools for non-developers. He's opened for The Who and has checks from Donald Knuth. Transcript:
Live from DevOpsDays Portland, I speak with Gene Kim, Author of "The Phoenix Project" and the upcoming book "The Unicorn Project."  When I started this podcast, one of my goals was to talk to Gene about his own experiences in IT, thankfully this trip to DevOpsDays in PDX helped that happen.  Cameos by Jennifer Davis, Matty Stratton, Jason Yee and Terri Haber! Gene Kim is a multiple award-winning CTO, researcher and author, and has been studying high-performing technology organizations since 1999. He was founder and CTO of Tripwire for 13 years. He has written five books, including “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win”, “The DevOps Handbook”, “Accelerate” and the upcoming “The Unicorn Project”. Since 2014, he has been the organizer of the DevOps Enterprise Summit, studying the technology transformations of large, complex organizations. Transcript - The Unicorn Project - DevOps Enterprise Summit Las Vegas -
The On-Call Nightmares Listener feedback system works! Without your stories I just cannot do this podcast. Thankfully, Jason Schuster reached out to share his experience in a 20 year career in technology. Share in his nightmare on this latest episode! Transcript: Jason's Bio: After graduating with a BFA in theater design in 2000 I landed my first job admiring HPUX servers. I took a low ball salary in exchange for training. While I got the training it took a long while for the scales to even out inheriting an outgoing sysadmins servers when I was less than a year on the job. My true passion for automating all the things came on an off site DR test watching 2 senior admins formatting disks one at a time and building a crazy number of volume groups and luns on them by hand. DR used to be a real interesting space that having so much stuff virtualized has mostly solved. After working on various .gov contracts and then supporting internal systems for 13 years I made the jump to devops at one small startup that folded out from under me but did start me on my way. I joined Stratasan just after new years and am loving this place. We are big fans of making boring things boring and not adding unneeded tools to our lives. Mostly I have been extending the reach of our terraform while trying to cut down the number of services we use in AWS to just what is needed. I have also been highlighting metrics we are missing to help us making good planning choices.
Live from DevOpsDays Chicago! I meet up with Ops Veteran, Michael Stahnke as we discuss his career in technology. From the weird days of AIX systems all the way till his time now at CricleCI, Michael has plenty of great stories. Special cameos by Jason Yee and Joshua Zimmerman (our laugh track). Michael Stahnke is VP of Platform Engineering at CircleCI. Prior to this role, he was at Puppet running engineering for Puppet Enterprise, Puppet Open source, and SRE. He is an author for State of DevOps Report in 2018 and 2019. Michael also helped get the Extra Packages for Enterprise Linux (EPEL) repository off the ground in 2005, is the author of Pro OpenSSH (Apress, 2005), is an organizer of Devopsdays Madison. You can find reach him @stahnma on nearly any service online. Transcript:
Getting paid is a pretty dang important part of your job. Mike Grayson and the team at Paychex are working to make sure that the databases that handle that are always online. This week I catch up with Mike Grayson who's been a great advocate for the database ops community. Mike is a Senior Database Engineer specializing in DevOps, MongoDB, and Apache Kafka based out of Rochester, New York. He is a MongoDB Master and speaker in the Oracle, SQL Server and MongoDB communities. Transcript:
X gonna give it to ya! Xander from the Microsoft Azure Kubernetes SRE Team joins me to talk about his history on-call and more! Xander is a Site Reliability Engineer at Microsoft, he currently slings containers on Azure Kubernetes Service. Previous to Microsoft, he did all the things with retail tech at both Starbucks and Target. You are always welcome to send him your favorite cat pictures. @XanderGrzy Full Transcript:
On-call can come in different shapes and sizes. Sometimes it's a group of developers who are attacking a problem to keep other developers afloat. That's what Ben Halpern and the team at the DEV Community are up to. Founder of DEV, Canadian, generalist software developer who writes a lot of Ruby. Transcript:
This week I speak with my friend Matty Stratton as we discuss the hard times and the processes to make them better. Matty Stratton is a DevOps Advocate at PagerDuty, where he helps dev and ops teams advance the practice of their craft and become more operationally mature. He collaborates with PagerDuty customers and industry thought leaders in the broader DevOps community, and back when he drove, his license plate actually said “DevOps”. Matty has over 20 years experience in IT operations, ranging from large financial institutions such as JPMorganChase and internet firms, including He is a sought-after speaker internationally, presenting at Agile, DevOps, and ITSM focused events, including ChefConf, DevOpsDays, Interop, PINK, and others worldwide. Matty is the founder and co-host of the popular Arrested DevOps podcast, as well as a global organizer of the DevOpsDays set of conferences. He lives in Chicago and has three awesome kids, who he loves just a little bit more than he loves Doctor Who. He is currently on a mission to discover the best pho in the world. Transcript (txt format) - Pagerduty Summit - sept 23-25 in San Fran. Breakathon, etc. for a great discount PDS19SAT Devopsdays chicago - use the code ADO2019 for 20% off. Breakathon -
Datadog Dash was this week which meant I was lucky enough to catch up with my friend, Jason Yee. We discuss his time in tech, measuring everything and a lot more! Jason is a technical evangelist at Datadog, where he works to inspire developers and ops engineers with the power of metrics and monitoring. Previously, he was the community manager for DevOps & Performance at O'Reilly Media and a software engineer at MongoDB. He's currently exploring the world while living as a nomad and would love to hear about where you live. transcript:
Episode 30 is a waterfall of information you'll soak up and learn a ton from. Things get a bit wet and wild for Tim in this episode of On-Call Nightmares! A great discussion about a long history in tech, the things you just can't plan for and more. Tim is an engineering manager at InfluxData with over 20 years of experience. His technical interests include high-performance, scalable, fault-tolerant cloud infrastructure, interconnected hybrid architecture, containerization (c14n?) all the way down, and always winning buzzword bingo. Helping teams achieve their highest potential is his true calling, which often means planting ideas and staying out of the way. transcript:
This week's conversation is with Molly Struve of Kenna Security! We discuss her path to tech, how her team worked to fix their on-call rotation and more! Molly Struve is the Lead Site Reliability Engineer at Kenna Security. She joined Kenna in 2015 and has had the opportunity to work on some of the most challenging aspects of Kenna’s code base. This includes scaling Elasticsearch, sharding MySQL databases, and creating an infrastructure that can grow as fast as Kenna's business. When not making code run faster, she can be found fulfilling her need for speed by riding and jumping her show horses. Transcript:
This week my homie supreme, Jason Hand joins me on On-Call Nightmares. We talk monitoring, SRE and getting in the van. Jason has spent the last 5 years connecting with technologists around the world on ideas related to balancing system and service reliability with the speed and agility required in today's digital world. Previously at VictorOps, Jason authored four books on the subjects of Site Reliability Engineering, Post-Incident Reviews, and ChatOps and was named "DevOps Evangelist of the Year" in 2016 by Co-organizer and emcee of the annual DevOpsDays Rockies conference, the Frontrange Site Reliability Meetup, Denver DevOps Meetup, and DevOps Road Trip, Jason enjoys connecting story tellers and actionable ideas with those who are hungry to learn. Co-host of the podcast "Community Pulse", Jason helps to bring together ideas and expertise as it relates to building community within tech (I.e. advocacy, evangelism). In his spare time, you'll find Jason soaking up the beautiful Colorado outdoors on a trail, lake, river, or mountain by day and enjoying craft IPA's and bluegrass music by night. Transcript:
Download from Google Play
Download from App Store