DiscoverScreaming in the Cloud
Screaming in the Cloud
Claim Ownership

Screaming in the Cloud

Author: Corey Quinn

Subscribed: 659Played: 22,880
Share

Description

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.
236 Episodes
Reverse
About JesseJesse is a seasoned operations engineer with a deep passion for understanding complex technical and organizational systems. He's spent his career helping Engineering teams achieve their business goals by improving how they interact with their technical systems, and with each other. He's currently a Cloud Economist with Duckbill Group, guiding organizations along their journey of cloud cost optimization and management.Links: The Duckbill Group: https://www.duckbillgroup.com/ Jesse’s Twitter: https://twitter.com/jesse_derose AWS Morning Brief: https://www.lastweekinaws.com/podcast/aws-morning-brief/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: Up next we’ve got the latest hits from Veem. Its climbing charts everywhere and soon its going to climb right into your heart. Here it is!Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Jesse DeRose, my colleague, and cloud economist at The Duckbill Group. Jesse, thank you for joining me, even though when I asked you, it isn’t exactly like you felt you had much of a choice.Jesse: [laugh]. I appreciate being on this podcast with you. I think you’ve had an opportunity to talk to a few other folks from our organization so far, so I’m just happy to be included. I’m hoping that I get the Members Only jacket after this recording.Corey: Oh, absolutely. The swag that goes out to guests is secret and wonderful, all at the same time. So, let’s start at the very beginning. It turns out that despite the prevalent narrative that we put out there, are clouds’ economist did not just spring fully-formed from the forehead of some God, and appear—fully-formed—ready to slice AWS bills to ribbons. There’s a process and there’s always an origin story. Where do you come from? Where were you before this?Jesse: So, my background is mixed. I started with a management of information systems degree, which is inherently interdisciplinary. You’re linking, sort of, the importance of both technical systems and business goals and outcomes into a single degree. And when I graduated with this degree, I went out into the working world and said, “Okay, I’m here. I’m ready. Here’s my degree. Here’s what I’m interested in. Let’s do this.”And nobody knew what to do with me, Corey. Nobody knew what management of information systems was. They simultaneously said, “You know, you don’t have a computer science or a computer engineering degree, so we don’t believe that you’ve got programmer chops.” Especially since this is the golden age of boot camps and quickstart programming books, and this movement that anybody can be a developer, which makes it even stranger that I’m not spending my weekends learning every programming language under the sun and automating the little tasks.Corey: Oh, absolutely. In fact, I got the exact same feedback; I don’t have a degree, and, “Oh, you couldn’t possibly be a good programmer because you don’t have the degree, you didn’t go to the right schools, you didn’t go to a boot camp. And when I asked you to write some sample code to demonstrate how to program, you just started crying instead.” Because yeah, it turns out, I’m actually not a good programmer, I just, sort of, brute force my way through it. And so far, so good.Unfortunately, people aren’t paying me to program these days, for excellent reason. In fact, I strongly suspect some people are paying me not to. But yeah, now it’s funny to laugh about, but back when you’re getting started, and going out in that space, in the world of operations, which we both came up in, and looking at it through a lens of the SRE movement, suddenly, “Hey, you used to be really, really good at all these Linux things and working on systems and keeping them up. Great. Now, learn to code.” And that was a big lift, at least for me.Jesse: Absolutely. I struggled with that so much because every company that I interviewed with, every company that I even talked to, just assumed that I had some kind of programming experience and didn’t want to talk to me if I didn’t have programming experience. And to me, looking back, I think I really look at it like I may not have the programming chops to be the software engineer that is going to write the code for you, day in and day out, but I am the person who knows enough that I can have the conversation with the software engineers. I can be the SRE, I can be the ops person that has the conversation with your software engineers and knows the things that they’re talking about, but also knows when to get out of their way and let them be the expert and do what they do best.Corey: There’s something to be said for valuing expertise in areas that are—how to put this—not the thing that you think you’re looking for. I mean, back when I was getting jobs, before The Duckbill Group and I would be the first ops hire into a team of developers—which happened a few times—the process was always the same, where you’d have a bunch of developers asking what they thought were ops questions, or just giving up on that entirely and trying to figure out how decent have an ops person I would be by how badly I programmed.Or, “Oh, okay, cool. You’re an ops person. Great. Can you invert a binary tree on a whiteboard?” It’s, “No, but I can invert a rack in your data center. I’ll go rage-flip the rack. Why not?” And it takes time. You have to guide those interviews and those conversations. But it’s always weird.] interviews are always weird because you’re being judged on a skill set that only matters when you’re interviewing for a job.Jesse: Yes. This is one of the things that I struggled with the most because I knew that most of the people who were interviewing me were either business people themselves—so they assumed that because their engineering team thought a certain way and acted a certain way that I should act that way—specifically, too—the same way as everybody else on the team. Or they were software engineers themselves, so they said, “Okay, I know how to invert a binary tree”—to your point, Corey—“So, do how to invert a binary tree. If you know how to do that, then sure, you can be part of my team because you know how to do these things and think about these things the same way I do.” Whereas because I was coming from this operations space, I knew other things that were equally as important but weren’t part of the conversations that they were used to having, day-to-day.So, they didn’t understand that just because I didn’t have the same engineering chops as them, that I didn’t have important information to share and wasn’t able to stand on my own two feet in other ways. And that was one of the things that I really struggled with when I was starting out in the industry because I was thinking to myself, well I have such passion to be part of these conversations, to have that conversation between the business side of the organization and the engineering side of the organization, from an ops perspective, from a business perspective, from a technical perspective, and if I can’t convince these people of my own volition, of my own passion for the good of the company, maybe data will help. Maybe there’s something that I can find from a scientific research perspective. Maybe there’s something that—I’m sure somebody else has already researched this topic or found the same problems that I have in this space, and maybe they’re already talking about it, and maybe I can ride their coattails, so to speak, or follow in their footsteps and use the information that other people are talking about the industry to help me not just land these jobs, but ultimately better sell myself and help these companies move forward.Corey: That is fundamentally an encapsulation of what I believe the ops role to be of, make things better, and move them forward. But man, do we get stuck in an awful lot of weird and strange places. And interviewing itself is a skill. Giving an interview, very often, it’s a, “I know a bunch of things that are trivia, but I know them. And if I know them, everyone must know them, therefore, if you don’t know them, you must be bad at things.” And it turns out—for better or worse—being able to memorize the documentation and spit out answers is not indicative of whether someone is a good ops person or a terrible one.Jesse: Absolutely. I think that is one of the biggest problems that I have faced and one of the biggest problems in the interviewing space today because it’s not just about, can you regurgitate this information, but it’s about how do you think? How do you look at problems? How do you communicate to the rest of your team, and within the rest of the organization? Those technical skills are ultimately important because you do need to understand some amount of technical information to have those conversations, but the soft skills are also super, super important to be able to communicate effectively, to be able to think collaboratively, and help everybody, not just yourself, but help the team build that shared purpose and move forward together.Corey: We’re talking right now, so far, about traditional ops roles. Then we have what we do here, which is beyond the rest of all of that, where all of what we just said is necessary but not sufficient. Then it comes down to great, okay. So, you understand how systems work together; we found that, for what we do and how we do it, you need to be a competent ops person as a fundamental tenet.Otherwise, learning what all these AWS services do will occupy you for the next three years. So okay, we start off there. Then on top of that is, okay, there are consulting skills that it turns out are possible to teach, but incredibly challenging and time-consuming because a lot of them boil down to, can you be in a meeting with stakeholders of various levels? Can you deliver bad news in a way that they don’t hate you? Because they don’t really want to pay you just say yes to whatever they think.And can you do that in such a way without, you know, actively insulting them, which sounds like a strange thing until you realize, oh, wait, that’s right. I do that, too. So ooh, yeah. Corey is going to have that problem, isn’t he? Yeah. And that’s part of the beautiful part about this place is that finally, I’m able to hire people like you.You were the first cloud economist here, which meant suddenly I didn’t have to do it all myself and my mouth slowly stopped getting me in trouble in consulting engagements, so I could spend more time having my mouth getting me in trouble on Twitter.Jesse: Yeah, I have to tell you, Corey, when I originally spoke with you and Mike about this role, I had just taken another operations position with a tech startup, and I was about two months into the role. And Mike sat down with me for coffee one morning and said, “Hey, we’re thinking about doing this thing. Are you interested?” And I said, “Yes, but I just started this other operations gig. I can’t up and leave them; I really care about the team, I really care about the company. And it would look really bad on me if I just, you know, two months left.”Because—unless they were a really, really awful employer, which they weren’t. So, I said, “Sure. I’m interested in doing some kind of part-time work.” And that’s ultimately where I started with you and your business partner, Mike. And I have to admit to when Mike originally approached me and said, “Hey, this is what we are thinking about; this is what we’re doing,” I didn’t really think twice about the opportunity because I wanted to work with you and Mike again.But the way that Mike described the work, just didn’t stick with me. It didn’t resonate with me, it was more about, “Hey, I would love to work with Mike and Corey again,” than, “Oh, my God. This sounds like the dream role that I want to be a part of.” And then, when I came back to Mike, probably, I don’t know, a month or two later, after I had started working part-time with both of you, I said, you know, “Mike, I don’t think I really made myself clear. I want to make sure that I help you understand, ultimately, the things that I want to do are having these conversations, being that bridge with the business side, and being able to talk tech with the tech side, and being able to talk business, and make sure that both sides of the conversation are aligned.”And he just looked at me and said, “What do you think we’re doing? What do you think we sold you on?” And it was that aha moment where I thought, “Oh, my god, yes.” I had already said yes; I was already working with both of you part-time, but that was the moment that really solidified it for me of, this is what I’ve kind of been moving towards. I’ve been wanting to be that person that can speak both languages and have a conversation with both sides of the table, and speak to multiple different audiences, and now I’m finally getting the chance to do that. I’m getting the chance to grow both skill sets, which I think is extremely rare in a lot of the smaller tech spaces that we see today.Corey: You’ve hit on one of the secrets of The Duckbill Group if I can be so grandiose as to claim that. And it’s true because we take a look at people that we bring in, and things that they’re good at, and things that we do—the things we do publicly and the things that we do, sort of, behind the scenes and there’s no reason we don’t talk about them publicly, but there’s no real reason for us to do so. Easy example, and what I want to talk to you about next is, you are deep into improving understanding of complex systems, both technical—okay, great people expect that—and organizational, which sometimes throws people for a loop. And it sounds like a weird thing to focus on here because we fix the AWS bill. We do not bill ourselves as management consultants, we do not bill ourselves as coming in and we will restructure your organization because that sounds patently ridiculous, and no one in their right mind is going to buy that thing.I wouldn’t buy it, at least not for me. My God, there are large consultancies that specialize in these sorts of things. I don’t know how they do it because I certainly don’t. We’re not here to sell that, though. Fixing the AWS bill—I mean actually fixing it. Fixing the business problem tied to it mandates an understanding of those complex systems. And your expertise and interest in that area is incredibly helpful here. Tell me more about it.Jesse: This is one of the things that I’ve been really fascinated by ever since I joined Duckbill Group. I think everybody in Duckbill Group has a superpower or has a really passionate hobby to some extent, which makes each of us really interesting, unique individuals that can focus in different areas of a client’s bill or a client’s pain points when it comes to cloud cost management and help, and find the parts that are frustrating, find the levers that can be moved, and point them out and say, “Okay, this is ultimately where you want to pull this lever or not pull this lever to make these changes.” And to your point, Corey, the one that is most interesting and passionate for me is that organizational development space. It is really understanding, not just the small things that we can do today to help you save money on your AWS bill, but how we, and collectively how our clients can think about costs long term to save money on their AWS bill. And I know that sounds really, really broad, and that’s part of why I think that there is a lot of nuance in this space, to your point about other organizations or other vendors that are providing these consulting services, and I think is also something that is also difficult to sell, which is why it’s not our expertise in terms of what we are on the cover trying to sell to any of our clients.But I definitely think that there is opportunity to have some of those conversations within each of our clients’ spaces to talk about some of the pain points that we see that may ultimately lead to better cost management practices long term, things that ultimately might help the engineering teams communicate better with finance on a long term basis, help the finance team and any of the leadership team more collaboratively talk with the engineering teams about understanding how much money is the product costing us? How much can we continue to spend on this product? Or how much can we discount one of our products for our customers before we are losing shares, losing money? What are the fine lines that we understand, based on how much money we’re ultimately spending on these features, on these products, that will help us make better data-driven decisions about other parts of the company?Corey: I really love installing, upgrading, and fixing security agents in my cloud estate! Why do I say that? Because I sell things, because I sell things for a company that deploys an agent, there's no other reason. Because let’s face it. Agents can be a real headache. Well, now Orca Security gives you a single tool that detects basically every risk in your cloud environment -- and that’s as easy to install and maintain as a smartphone app. It is agentless, or my intro would’ve gotten me into trouble here, but  it can still see deep into your AWS workloads, while guaranteeing 100% coverage. With Orca Security, there are no overlooked assets, no DevOps headaches, and believe me you will hear from those people if you cause them headaches. and no performance hits on live environments. Connect your first cloud account in minutes and see for yourself at orca.security. Thats “Orca” as in whale, “dot” security as in that things you company claims to care about but doesn’t until right after it really should have.Corey: And I want to call out that this is something that we are comparatively enthusiastic amateurs around. It’s valuable; it’s important; it’s an awful lot of deep work, but I’m not sure that we go more than three working days without referencing Dr. Nicole Forsgren’s work, internally, as we think about these things. So, if you’re hearing this, and you think that okay, AWS bill, fine, whatever. We really want to talk about organizational challenge and improvement, oh, my God, talk to Dr. Forsgren. Holy crap. She’s been on this show, at least I think, three times now, and every time I feel like I’m lucky to get her. Most weeks, you know, I’m stuck with people like you. My God, Jesse.Jesse: [laugh].Corey: But no, her work is seminal in this space. And in seriousness, every time I start to question the value of expertise, I look at how deep she goes on all of these things and the level both of understanding that’s baked into this, and the amount of sheer work that it takes for her to take all of that very deep, penetrating analysis, and make it accessible and understandable. But every time I look at her work, I come away more impressed than I started, and that wasn’t a low bar, to begin with.Jesse: Yes. And this gets back to my earlier comment about, I just want to be here to help. And in a lot of cases, when I was starting out, folks didn’t know what to do with me because I didn’t have data, I didn’t have any information. But we have folks in the industry like Dr. Nicole Forsgren, and other folks who are doing the research, who are knowledgeable in this space, who are putting in the effort to run these studies, to analyze this data, to share the results—and to your comment, Corey—to share the results in a way that makes sense to everybody, that’s easy to read, it’s approachable, it’s understandable.And I am so thankful to have folks in the industry who are doing that work because that is not my expertise. But that means that I get to say to the folks that I’m working with—internally and with our clients—“Hey, don’t take my word for it. There are other folks who have done research, and here’s what the data says, and here’s how we can help you apply this work, or you can apply this work within your own organization.”Corey: And this is the challenge in some cases, too, where there’s a lot of organizational theory, and that is being advanced heavily, and in ways that makes teams more effective is super helpful. The challenge, of course, is that sitting here and talking about the theoretical layout of teams and how to improve functioning as an organization is all well and good, but we’re brought in by our clients to help them with their AWS billing situation, so at some point the conversation has to evolve beyond, “Okay, so here’s what you could do in theory, in a vacuum, assuming spherical cows, et cetera, et cetera.” And their response is, “That’s great. You actually going to fix the bill or just pontificate for a while here?” So, for better or worse, we don’t really get to sit there and have deep organizational conversations at length with our clients, just because that’s not the problem we’re there to solve. Everyone’s busy and we want to make sure that we’re respectful of their time.Jesse: Yeah, one of the things that I’ve learned through my time with Duckbill Group and through other similar roles in the past, is that I may have a strong passion, I may have this strong guiding light in my head, but it’s not the same guiding light that our clients or our customers have. And that’s fine because we don’t need to necessarily have the same goal in sight, but that means that I, to best serve our clients or best serve our customers, need to make sure that I am aligning, that Duckbill Group is aligning with the clients that we’re working with, with the organizations that we’re working with. So, I may be thinking to myself, “Oh, my gosh, I would love to come in with this long list of organizational development practices and share a million different things,” but that’s not ultimately what they need. Maybe that’s something that they’re going to want long-term, but it’s not what we’re here for today. And it’s more important to help serve the client that is in front of me that is asking for things now, today, than try to educate them on quote-unquote, “What their problem is,” and then sell them on a solution.Corey: The thing that I think gets lost as well, whenever I start talking in-depth about what we do on Twitter, for example, is generally from other engineers whose response is, “Okay, yeah, sounds great. You come in and say that it will save a bunch of money before you rearchitect our application. But that’s an awful lot of engineering time, so I bet engineers most hate you.” And the honest answer is, “Yeah. We know that you would save some significant money if you rearchitected your application, but we also pay attention to organizational dynamics, and we know you’re not going to do it because there’s no business justification for doing it. So, we’re not going to bring it up, other than, possibly, in passing, just so people are aware of the relative benefit if they want to bake that in.”But we don’t go in and suggest nonsense that is abhorrent. We all started as engineers ourselves. We are sensitive to engineering time, both in terms of what engineers enjoy working on—which is more important than people think—as well as the sheer cost of engineering time. People get concerned about the AWS bill, but it’s invariably pale compared to the cost of the people working on the AWS infrastructure. You want to optimize the right things, and then you want to get back to doing what your company does, not continue to iterate forward and spend thousands of dollars to save tens of dollars.Jesse: Yeah, it’s an extremely difficult balance to find. And it’s really important to think about, is the ROI on this change worth the change? How much money am I ultimately going to save for the amount of engineering effort that I’m going to put in? Because we don’t want to run in and tell your engineering teams, “Rearchitect everything,” if it’s going to maybe save them tens of dollars. We want to make it very clear that here are the different levers that you can pull to affect change within your AWS bill, to optimize your AWS bill.It’s up to you which ones you want to pull. We are just giving you that guiding path, and then you have the option to say, “Yes, I want to spend the engineering effort to get this kind of ROI.” Or, “You know what? I don’t think that’s the priority for us right now.” One of the things that I’ve noticed with a number of our clients is that balance of, do we focus more time on building new features which brings in new customers, or do we spend time on the existing infrastructure and making changes to the existing workloads that we have?And it’s this delicate balance of internal work—the things that you have put on the backburner over time that you ultimately say, “Well, I’ll come back to that,” versus the new things that are ultimately going to bring in new customers, bring in new users, potentially bring in more money. Because both are important, but there needs to be a delicate balance of both, and I feel like that’s one of the biggest challenges that we face when managing an AWS bill and trying to optimize, and organize, and better manage cloud costs.Corey: And that’s fundamentally what it comes down to what is the best outcome for the client? And the answer to ‘what is best?’ Varies based upon their constraints, what they’re focused on, what they’re trying to achieve. You could look at the end result of one of our analyses for a customer, and take issue with it in a vacuum of but they’re spending all this money on things that they shouldn’t be doing, or don’t need to be. Why didn’t you suggest this, or this, or this, or this?And the answer is because, based upon our conversations, we knew that they weren’t going to do it, and suggesting things that we know they’re not going to do is one of the best ways to erode trust. We’re there to deliver an outcome. We’re not naive software that is just running a pile of tools on an Amazon bill and saying, “Here you go. Have fun.” We’re not the billing equivalent of a Nessus scan that someone slaps their logo on top of, drops off, then it’s 700 pages long. “Have a good one. Check, please.” It doesn’t work. Not well, anyway. It doesn’t drive to lasting change.Jesse: Yeah. And I think that’s ultimately part of where we come in best because we ultimately want to be that bridge between the engineering teams, and finance, and leadership, and essentially the business side of the business. We want to give both sides the information that they need to be able to speak effectively and collaboratively with the other side. We want to make sure that the finance side understands enough tech that they can work with the engineering teams to guide them in terms of what goals are important for finance, in terms of managing budget, in terms of forecasting spend. And then from the flip side, we want to make sure that engineering understands that the business collaboratively wants to manage these things, and also help build the organization, and engineering has great, great potential to do little things, take little steps to help the organization get the data that they need to make these decisions.And that scratches that itch for me. That scratches that itch of how can I really be both sides of the conversation? How can I flex the business lingo and also flex the tech lingo, maybe not in the same conversation but with the same client over time, to really help both sides understand that both sides are important, and both sides need to understand each other in order to help the business succeed?Corey: And that’s ultimately what it comes down to. Jesse, thanks for taking the time to speak with me. If people want to hear more about what you have to say, and how you like to say it, where can they find you, other than go into The Duckbill Group and get me a consulting engagement underway?Jesse: So, there’s two places that folks can find me. The first is on Twitter at @jesse_derose. And then the second is our other podcast, the AWS Morning Brief podcasts, on the mornings where we don’t get to hear your melodious voice, Corey.My colleague, Pete Cheslock and I are talking about all of the interesting things that we’ve seen on AWS, from client-specific situations to things that we’ve seen on the job in previous organizations that we use to work at, to new features that AWS is releasing. There’s a whole slew of interesting things that we get to talk about from a more practical perspective. Because there’s so many releases coming out day-to-day that you’re obviously helping us stay on top of, Corey, but there’s so many other things that we want to make sure that we are talking about from the real-world applicable perspective.Corey: And we will, of course, put links to these things in the [show notes 00:27:48], but you should already be aware of most of them, at least the ones that are on the company side. Jesse, thank you so much for taking the time to speak with me.Jesse: Thank you for having me.Corey: Jesse DeRose, cloud economist here at The Duckbill Group. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment telling me that organizational dynamics really aren’t that hard and you could solve it better than Dr. Forsgren does, in a weekend.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.This has been a HumblePod production. Stay humble.
About JamesJames is the Redmonk co-founder, sunshine in a bag, industry analyst loves developers, "motivating in a surreal kind of way". Came up with "progressive delivery". He/HimLinks: RedMonk: https://redmonk.com/ Twitter: https://twitter.com/MonkChips Monktoberfest: https://monktoberfest.com/ Monki Gras: https://monkigras.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: And now for something completely different!Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by James Governor, analyst and co-founder of a boutique analysis shop called RedMonk. James, thank you for coming on the show.James: Oh, it’s my pleasure. Corey.Corey: I’ve more or less had to continue pestering you with invites onto this for years because it’s a high bar, but you are absolutely one of my favorite people in tech for a variety of reasons that I’m sure we’re going to get into. But first, let’s let you tell the story. What is it you’d say it is that you do here?James: We—industry analysts; we’re a research firm, as you said. I think we do things slightly differently. RedMonk has a very strong opinion about how the industry works. And so whilst there are plenty of research firms that look at the industry, and technology adoption, and process adoption through the lens of the purchaser, RedMonk focuses on it through the lens of the practitioner: the developer, the SRE, the people that are really doing the engineering. And so, historically IT was a top-down function: it required a lot of permission; it was something that was slow, you would make a request, you might get some resources six to nine months later, and they were probably the resources that you didn’t actually want, but something that was purchased from somebody that was particularly good at selling things.Corey: Yes. And the thing that you were purchasing was aimed at people who are particularly good at buying things, but not using the things.James: Exactly right. And so I think that RedMonk we look at the world—the new world, which is based on the fact there’s open-source software, there’s cloud-based software, there are platforms like GitHub. So, there’s all of this knowledge out there, and increasingly—it’s not a permission-free world. But technology adoption is more strongly influenced than ever by developers. That’s what RedMonk understands; that’s what makes us tick; that’s what excites us. What are the decisions that developers are making? When and why? And how can we tap into that knowledge to help everyone become more effective?Corey: RedMonk is one of those companies that is so rare, it may as well not count when you do a survey of a landscape. We’ve touched on that before on the show. In 2019, we had your colleague, Rachel Stevens on the show; in 2020, we had your business partner Stephen O’Grady on, and in 2021 we have you. Apparently, you’re doling out staff at the rate of one a year. That’s okay; I will outlast your expansion plans.James: Yeah, I think you probably will. One thing that RedMonk is not good at doing is growing, which may go to some of the uniqueness that you’re talking about. We do what we do very well, but we definitely still haven’t worked out what we’re going to be when we grow up.Corey: I will admit that every time I see a RedMonk blog post that comes across my desk, I don’t even need to click on it anymore; I don’t need to read the thing because I already get that sinking feeling, because I know without even glancing at it, I’m going to read this and it’s going to be depressing because I’m going to wish I had written it instead because the points are always so pitch-perfect. And it feels like the thing that I struggle to articulate on the best of days, you folks—across the board—just wind up putting out almost effortlessly. Or at least that’s how it seems from the outside.James: I think Stephen does that.Corey: It’s funny; it’s what he said about you.James: I like to sell his ideas, sell his work. He’s the brains and the talent of the operation in terms of co-founders. Kelly and Rachel are both incredibly smart people, and yeah, they definitely do a fantastic job of writing with clarity, and getting ideas across by stuff just tends to be sort of jumbled up. I do my best, but certainly, those fully formed, ‘I wish I had written that’ pieces, they come from my colleagues. So, thank you very much for that praise of them.Corey: One of the central tenets that RedMonk has always believed and espoused is that developers are kingmakers, to use the term—and I steal that term, of course, from your co-founder’s book, The New Kingmakers, which, from my read, was talking about developers. That makes a lot of sense for a lot of tools that see bottom-up adoption, but in a world of cloud, where you’re seeing massive deals get signed, I don’t know too many developers out there who can sign a 50 million dollar cloud services contract more than once because they get fired the first time they outstrip their authority. Do you think that that model is changing?James: So, ‘new kingmakers’ is quite a gendered term, and I have been asked to reconsider its use because, I mean, I don’t know whether it should be ‘new monarchmakers?’ That aside, developers are a fundamentally influential constituency. It’s important, I think, to say that they themselves are not necessarily the monarchs; they are not the ones sitting in Buckingham Palace [laugh] or whatever, but they are influences. And it’s important to understand the difference between influence and purchase. You’re absolutely right, Corey, the cloud is becoming more, like traditional IT. Something I noticed with your good friends at GCP, this was shortly after the article came out that they were going to cut bait if they didn’t get to number two after whatever period of time it was, they then went intentionally inside a bunch of 10-year deals with massive enterprises, I guess, to make it clear that they are in it for the long haul. But yeah, were developers making that decision? No. On the other hand, we don’t talk to any organizations that are good at creating digital products and services—and increasingly, that’s something that pretty much everybody needs to do—that do not pay a lot more attention to the needs and desires of their developers. They are reshoring, they are not outsourcing everything, they want developers that are close to the business, that understand the business, and they’re investing heavily in those people. And rather than seeing them as, sort of, oh, we’re going to get the cheapest possible people we can that have some Java skills and hope that these applications aren’t crap. It may not be Netflix, “Hey, we’re going to pay above market rate,” but it’s certainly what do they want? What tools do they want to use? How can we help them become more effective? And so yeah, you might sign a really big deal, but you still want to be thinking, “Hang on a minute, what are the skills that people have? What is going to make them happy? What do they know? Because if they aren’t productive, if they aren’t happy, we may lose them, and they are very, very important talent.” So, they may not be the people with 50 million dollars in budget, but their opinion is indeed important. And I think that RedMonk is not saying there is no such thing as top-down purchasing anymore. What we are saying is that you need to be serving the needs of this very important constituency, and they will make you more productive. The happier they are, the more flow they can have, the more creative they can be with the tools at hand, the better the business outcomes are going to be. So, it’s really about having a mindset and an organizational structure that enables you to become more effective by better serving the needs of developers, frankly. It used to just be the only tech companies had to care about that, but now everybody does. I mean, if we look at, whoever it is: Lego, or Capital One, or Branch, the new insurance company—I love Branch, by the way. I mean—Corey: Yeah. They’re fantastic people, I love working with them. I wish I got to spend more time talking with them. So far, all I can do is drag them on to the podcast and argue on Twitter, but one of these days, one of these days, they’re going to have an AWS bill bigger than 50 cents a month, and then, oh, then I’ve got them.James: There you go. But I think that the thing of him intentionally saying we’re not going to set up—I mean, are they in Columbus, I think?Corey: They are. The greater Ohio region, yes.James: Yes. And Joe is all about, we need tools that juniors can be effective with, and we need to satisfy the needs of those juniors so they can be productive in driving our business forward. Juniors is already—and perhaps as a bad term, but new entrants into the industry, and how can we support them where they are, but also help them gain new skills to become more effective? And I just think it’s about a different posture, and I think they’re a great example because not everybody is south of Market, able to pay 350 grand a year plus stock options. That’s just not realistic for most businesses. So, it is important to think about developers and their needs, the skills they learned, if they’re from a non-traditional background, what are those skills? How can we support them and become more effective?Corey: That’s really what it comes down to. We’re all trying to do more with less, but rather than trying to work twice as hard, how to become more effective with the time we have and still go home in time for dinner every day?James: Definitely. I have to say, I mean, 2020 sucked in lots of ways, but not missing a single meal with my family definitely was not one of them.Corey: Yeah. There are certain things I’m willing to trade and certain things I’m not. And honestly, family time is one of them. So, I met you—I don’t even recall what year—because what is even time anymore in this pandemic era?—where we sat down and grabbed a drink, I want to say it was at Google Cloud Next—the conference that Google does every year about their cloud—not that Google loses interest in things, but even their conference is called ‘Next’—but I didn’t know what to expect when I sat down and spoke with you, and I got the sense you had no idea what to make of me back then because I was basically what I am now, only less fully formed. I was obnoxious on Twitter, I had barely coherent thoughts that I could periodically hurl into the abyss and see if they resonated, but stands out is one of the seminal grabbing a drink with someone moments in the course of my career.James: Well, I mean, fledgling Corey was pretty close to where he is now. But yeah, you bring something unique to the table. And I didn’t totally know what to expect; I knew there would be snark. But yeah, it was certainly a pleasure to meet you, and I think that whenever I meet someone, I’m always interested in if there is any way I can help them. And it was nice because you’re clearly a talented fellow and everything else, but it was like, are there some areas where I might be able to help? I mean, I think that’s a good position as a human meeting another human. And yeah, it was a pleasure. I think it was in the Intercontinental, I guess, in [unintelligible 00:11:00].Corey: Yes, that’s exactly where it was. Good memory. In fact, I can tell you the date: it was April 11 of 2019. And I know that because right after we finished having a drink, you tweeted out a GIF of Snow White carving a pie, saying, “QuinnyPig is an industry analyst.” And the first time I saw that, it was, “I thought he liked me. Why on earth would he insult me that way?”But it turned into something where when you have loud angry opinions, if you call yourself an analyst, suddenly people know what to do with you. I’m not kidding, I had that tweet laser engraved on a piece of wood through Laser Tweets. It is sitting on my shelf right now, which is how I know the date because it’s the closest thing I have to a credential in almost anything that I do. So, congratulations, you’re the accrediting university. Good job.James: [laugh]. I credentialed you. How about that?Corey: It’s true, though. It didn’t occur to me that analysts were a real thing. I didn’t know what it was, and that’s part of what we talked about at lunch, where it seemed that every time I tried to articulate what I do, people got confused. Analyst is not that far removed from an awful lot of what I do. And as I started going to analyst events, and catching up with other analysts—you know, the real kind of analyst, I would say, “I feel like a fake analyst. I have no idea what I’m actually doing.” And they said, “You are an analyst. Welcome to the club. We meet at the bar.” It turns out, no one really knows what is going on, fully, in this zany industry, and I feel like that the thing that we all bond over on some level is the sense of, we each only see a piece of it, and we try and piece it together with our understanding of the world and ideally try and make some sense out of it. At least, that’s my off-the-cuff definition of an industry analyst. As someone who’s an actual industry analyst, and not just a pretend one on Twitter, what’s your take on the subject?James: Well, it’s a remarkable privilege, and it’s interesting because it is an uncredentialed job. Anybody can be, theoretically at least, an industry analyst. If people say you are and think you are, then then you are; you walk and quack like a duck. It’s basically about research and trying to understand a problem space and trying to articulate and help people to basically become more effective by understanding that problem space themselves, more. So, it might be about products, as I say, it might be about processes, but for me, I’ve just always enjoyed research. And I’ve always enjoyed advice. You need a particular mindset to give people advice. That’s one of the key things that, as an industry analyst, you’re sort of expected to do. But yeah, it’s the getting out there and learning from people that is the best part of the job. And I guess that’s why I’ve been doing it for such an ungodly long time; because I love learning, and I love talking to people, and I love trying to help people understand stuff. So, it suits me very well. It’s basically a job, which is about research, analysis, communication.Corey: The research part is the part that I want to push back on because you say that, and I cringe. On paper, I have an eighth-grade education. And academia was never really something that I was drawn to, excelled at, or frankly, was even halfway competent at for a variety of reasons. So, when you say ‘research,’ I think of something awful and horrible. But then I look at the things I do when I talk to companies that are building something, and then I talked to the customers who are using the thing the company’s building, and, okay, those two things don’t always align as far as conversations go, so let’s take this thing that they built, and I’ll build something myself with it in an afternoon and see what the real story is. And it never occurred to me until we started having conversations to view that through the lens of well, that is actual research. I just consider it messing around with computers until something explodes.James: Well, I think. I mean, that is research, isn’t it?Corey: I think so. I’m trying to understand what your vision of research is. Because from where I sit, it’s either something negative and boring or almost subverting the premises you’re starting with to a point where you can twist it back on itself in some sort of ridiculous pretzel and come out with something that if it’s not functional, at least it’s hopefully funny.James: The funny part I certainly wish that I could get anywhere close to the level of humor that you bring to the table on some of the analysis. But look, I mean, yes, it’s easy to see things as a sort of dry. Look, I mean, a great job I had randomly in my 20s, I sort of lied, fluked, lucked my way into researching Eastern European art and architecture. And a big part of the job was going to all of these amazing museums and libraries in and around London, trying to find catalogs from art exhibitions. And you’re learning about [Anastasi Kremnica 00:15:36], one of the greatest exponents of the illuminated manuscript and just, sort of, finding out about this interesting work, you’re finding out that some of the articles in this dictionary that you’re researching for had been completely made up, and that there wasn’t a bibliography, these were people that were writing for free and they just made shit up, so… but I just found that fascinating, and if you point me at a body of knowledge, I will enjoy learning stuff. So, I totally know what you mean; one can look at it from a, is this an academic pursuit? But I think, yeah, I’ve just always enjoyed learning stuff. And in terms of what is research, a lot of what RedMonk does is on the qualitative side; we’re trying to understand what people think of things, why they make the choices that they do, you have thousands of conversations, synthesize that into a worldview, you may try and play with those tools, you can’t always do that. I mean, to your point, play with things and break things, but how deep can you go? I’m talking to developers that are writing in Rust; they’re writing in Go, they’re writing in Node, they’re writing in, you know, all of these programming languages under the sun. I don’t know every programming language, so you have to synthesize. I know a little bit and enough to probably cut off my own thumb, but it’s about trying to understand people’s experience. And then, of course, you have a chance to bring some quantitative things to the table. That was one of the things that RedMonk for a long time, we’d always—we were always very wary of, sort of, quantitative models in research because you see this stuff, it’s all hockey sticks, it’s all up into the right—Corey: Yeah. You have that ridiculous graph thing, which I’m sorry, I’m sure has an official name. And every analyst firm has its own magic name, whether it’s a ‘Magic Quadrant,’ or the ‘Forrester Wave,’ or, I don’t know, ‘The Crushing Pit Of Despair.’ I don’t know what company is which. But you have the programming language up-and-to-the-right line graph that I’m not sure the exact methodology, but you wind up placing slash ranking all of the programming languages that are whatever body of work you’re consuming—I believe it might be Stack Overflow—James: Yeah.Corey: —and people look for that whenever it comes out. And for some reason, no one ever yells at you the way that they would if you were—oh, I don’t know, a woman—or someone who didn’t look like us, with our over-represented faces.James: Well, yeah. There is some of that. I mean, look, there are two defining forces to the culture. One is outrage, and if you can tap into people’s outrage, then you’re golden—Corey: Oh, rage-driven development is very much a thing. I guess I shouldn’t be quite as flippant. It’s kind of magic that you can wind up publishing these things as an organization, and people mostly accept it. People pay attention to it; it gets a lot of publicity, but no one argues with you about nonsense, for the most, part that I’ve seen.James: I mean, so there’s a couple of things. One is outrage; universal human thing, and too much of that in the culture, but it seems to work in terms of driving attention. And the other is confirmation bias. So, I think the beauty of the programming language rankings—which is basically a scatterplot based on looking at conversations in StackOverflow and some behaviors in GitHub, and trying to understand whether they correlate—we’re very open about the methodology. It’s not something where—there are some other companies where you don’t actually know how they’ve reached the conclusions they do. And we’ve been doing it for a long time; it is somewhat dry. I mean, when you read the post the way Stephen writes it, he really does come across quite academic; 20 paragraphs of explication of the methodology followed by a few paragraphs explaining what we found with the research. Every time we publish it, someone will say, “CSS is not a programming language,” or, “Why is COBOL not on there?” And it’s largely a function of methodology. So, there’s always raged to be had.Corey: Oh, absolutely. Channeling rage is basically one of my primary core competencies.James: There you go. So, I think that it’s both. One of the beauties of the thing is that on any given day when we publish it, people either want to pat themselves on the back and say, “Hey, look, I’ve made a really good choice. My programming language is becoming more popular,” or they are furious and like, “Well, come on, we’re not seeing any slow down. I don’t know why those RedMonk folks are saying that.” So, in amongst those two things, the programming language rankings was where we began to realize that we could have a footprint that was a bit more quantitative, and trying to understand the breadcrumbs that developers were dropping because the simple fact is, is—look, when we look at the platforms where developers do their work today, they are in effect instrumented. And you can understand things, not with a survey where a lot of good developers—a lot of people in general—are not going to fill in surveys, but you can begin to understand people’s behaviors without talking to them, and so for RedMonk, that’s really thrilling. So, if we’ve got a model where we can understand things by talking to people, and understand things by not talking to people, then we’re cooking with gas.Corey: I really love installing, upgrading, and fixing security agents in my cloud estate! Why do I say that? Because I sell things, because I sell things for a company that deploys an agent, there's no other reason. Because let’s face it. Agents can be a real headache. Well, now Orca Security gives you a single tool that detects basically every risk in your cloud environment -- and that’s as easy to install and maintain as a smartphone app. It is agentless, or my intro would’ve gotten me into trouble here, but  it can still see deep into your AWS workloads, while guaranteeing 100% coverage. With Orca Security, there are no overlooked assets, no DevOps headaches, and believe me you will hear from those people if you cause them headaches. and no performance hits on live environments. Connect your first cloud account in minutes and see for yourself at orca.security. Thats “Orca” as in whale, “dot” security as in that things you company claims to care about but doesn’t until right after it really should have.Corey: One of the I think most defining characteristics about you is that, first, you tend to undersell the weight your words carry. And I can’t figure out, honestly, whether that is because you’re unaware of them, or you’re naturally a modest person, but I will say you’re absolutely one of my favorite Twitter follows; @monkchips. If you’re not following James, you absolutely should be. Mostly because of what you do whenever someone gives you a modicum of attention, or of credibility, or of power, and that is you immediately—it is reflexive and clearly so, you reach out to find someone you can use that credibility to lift up. It’s really an inspirational thing to see. It’s one of the things that if I could change anything about myself, it would be to make that less friction-full process, and I think it only comes from practice. You’re the kind of person I think—I guess I’m trying to say that I aspire to be in ways that are beyond where I already am.James: [laugh]. Well, that’s very charming. Look, we are creatures of extreme privilege. I mean, I say you and I specifically, but people in this industry generally. And maybe not enough people recognize that privilege, but I do, and it’s just become more and more clear to me the longer I’ve been in this industry, that privilege does need to be more evenly distributed. So, if I can help someone, I naturally will. I think it is a muscle that I’ve exercised, don’t get me wrong—Corey: Oh, it is a muscle and it is a skill that can absolutely be improved. I was nowhere near where I am now, back when I started. I gave talks early on in my speaking career, about how to handle a job interview. What I accidentally built was, “How to handle a job interview if you’re a white guy in tech,” which it turns out is not the inclusive message I wanted to be delivering, so I retired the talk until I could rebuild it with someone who didn’t look like me and give it jointly.James: And that’s admirable. And that’s—Corey: I wouldn’t say it’s admirable. I’d say it’s the bare minimum, to be perfectly honest.James: You’re too kind. I do what I can, it’s a very small amount. I do have a lot of privilege, and I’m aware that not everybody has that privilege. And I’m just a work in progress. I’m doing my best, but I guess what I would say is the people listening is that you do have an opportunity, as Corey said about me just now, maybe I don’t realize the weight of my words, what I would say is that perhaps you have privileges you can share, that you’re not fully aware that you have. In sharing those privileges, in finding folks that you can help it does make you feel good. And if you would like to feel better, trying to help people in some small way is one of the ways that you can feel better. And I mentioned outrage, and I was sort of joking in terms of the programming language rankings, but clearly, we live in a culture where there is too much outrage. And so to take a step back and help someone, that is a very pure thing and makes you feel good. So, if you want to feel a bit less outraged, feel that you’ve made an impact, you can never finish a day feeling bad about the contribution you’ve made if you’ve helped someone else. So, we do have a rare privilege, and I get a lot out of it. And so I would just say it works for me, and in an era when there’s a lot of anger around, helping people is usually the time when you’re not angry. And there’s a lot to be said for that.Corey: I’ll take it beyond that. It’s easy to cast this in a purely feel-good, oh, you’ll give something up in order to lift people up. It never works that way. It always comes back in some weird esoteric way. For example, I go to an awful lot of conferences during, you know, normal years, and I see an awful lot of events and they’re all—hmm—how to put this?—they’re all directionally the same. The RedMonk events are hands down the exception to all of that. I’ve been to Monktoberfest once, and I keep hoping to go to—I’m sorry, was it Monki Gras is the one in the UK?James: Monki Gras, yeah.Corey: Yeah. It’s just a different experience across the board where I didn’t even speak and I have a standing policy just due to time commitments not to really attend conferences I’m not speaking at. I made an exception, both due to the fact that it’s RedMonk, so I wanted to see what this event was all about, and also it was in Portland, Maine; my mom lived 15 minutes away, it’s an excuse to go back, but not spend too much time. So, great. It was more or less a lark, and it is hands down the number one event I will make it a point to attend. And I put that above re:Invent, which is the center of my cloud-y universe every year, just because of the stories that get told, the people that get invited, just the sheer number of good people in one place is incredible. And I don’t want to sound callous, or crass pointing this out, but more business for my company came out of that conference from casual conversations than any other three conferences you can name. It was phenomenal. And it wasn’t because I was there setting up an expo booth—there isn’t an expo hall—and it isn’t because I went around harassing people into signing contracts, which some people seem to think is how it works. It’s because there were good people, and I got to have great conversations. And I kept in touch with a lot of folks, and those relationships over time turned into business because that’s the way it works.James: Yeah. I mean, we don’t go big, we go small. We focus on creating an intimate environment that’s safe and inclusive and makes people feel good. We strongly curate the events we run. As Stephen explicitly says in terms of the talks that he accepts, these are talks that you won’t hear elsewhere. And we try and provide a platform for some different kind of thinking, some different voices, and we just had some magical, magical speakers, I think, at both events over the years. So, we keep it down to sort of the size of a village; we don’t want to be too much over the Dunbar number. And that’s where rich interactions between humans emerge. The idea, I think, at our conference is, is that over a couple of days, you will actually get to know some people, and know them well. And we have been lucky enough to attract many kind, and good, and nice people, and that’s what makes the event so great. It’s not because of Steve, or me, or the others on the team putting it together. It’s about the people that come. And they’re wonderful, and that’s why it’s a good event. The key there is we focus on amazing food and drink experiences, really nice people, and keep it small, and try and be as inclusive as you can. One of the things that we’ve done within the event is we’ve had a diversity and inclusion sponsorship. And so folks like GitHub, and MongoDB, and Red Hat have been kind enough—I mean, Red Hat—interestingly enough the event as a whole, Red Hat has sponsored Monktoberfest every year it’s been on. But the DNI sponsorship is interesting because what we do with that is we look at that as an opportunity. So, there’s a few things. When you’re running an event, you can solve the speaker problem because there is an amazing pipeline of just fantastic speakers from all different kinds of backgrounds. And I think we do quite well on that, but the DNI sponsorship is really about having a program with resources to make sure that your delegates begin to look a little bit more diverse as well. And that may involve travel stipends, as well as free tickets, accommodation, and so on, which is not an easy one to pull off.Corey: But it’s necessary. I mean, I will say one of the great things about this past year of remote—there have been a lot of trials and tribulations, don’t get me wrong—but the fact that suddenly all these conferences are available to anyone with an internet connection is a huge accessibility story. When we go back to in-person events, I don’t want to lose that.James: Yeah, I agree. I mean, I think that’s been one of the really interesting stories of the—and it is in so many dimensions. I bang on about this a lot, but so much talent in tech from Nigeria. Nigeria is just an amazing, amazing geography, huge population, tons of people doing really interesting work, educating themselves, and pushing and driving forward in tech, and then we make it hard for them to get visas to travel to the US or Europe. And I find that to be… disappointing. So, opening it up to other geographies—which is one of the things that free online events does—is fantastic. You know, perhaps somebody has some accessibility needs, and they just—it’s harder for them to travel. Or perhaps you’re a single parent and you’re unable to travel. Being able to dip into all of these events, I think is potentially a transformative model vis-à-vis inclusion. So, yeah, I hope, A) that you’re right, and, B) that we as an industry are intentional because without being intentional, we’re not going to realize those benefits, without understanding there were benefits, and we can indeed lower some of the barriers to entry participation, and perhaps most importantly, provide the feedback loop. Because it’s not enough to let people in; you need to welcome them. I talked about the DNI program: we have—we’re never quite sure what to call them. We call them mentors or things like that, but people to welcome people into the community, make introductions, this industry, sometimes it's, “Oh, great. We’ve got new people, but then we don’t support them when they arrive.” And that’s one of the things as an industry we are, frankly, bad at, and we need to get better at it.Corey: I could not agree with you more strongly. Every time I wind up looking at building an event or whatnot or seeing other people’s events, it’s easy to criticize, but I try to extend grace as much as possible. But whenever I see an event that is very clearly built by people with privilege, for people with privilege, it rubs me the wrong way. And I’m getting worse and worse with time at keeping my mouth shut about that thing. I know, believe it or not, I am capable of keeping my mouth shut from time to time or so I’m told. But it’s irritating, it rankles because it’s people not taking advantage of their privileged position to help others and that, at some point, bugs me.James: Me too. That’s the bottom line, we can and must do better. And so things that, sort of, make you proud of every year, I change my theme for Monki Gras, and, you know, it’s been about scaling your craft, it’s been about homebrews—so that was sort of about your side gig. It wasn’t about the hustle so much as just things people were interested in. Sometimes a side project turns into something amazing in its own right. I’ve done Scandinavian craft—the influence of the Nordics on our industry. We talk about privilege: every conference that you go to is basically a conference about what San Francisco thinks. So, it was nice to do something where I looked at the influence of Scandinavian craft and culture. Anyway, to get to my point, I did the conference one year about accessibility. I called it ‘accessible craft.’ And we had some folks from a group called Code Your Future, which is a nonprofit which is basically training refugees to code. And when you’ve got a wheelchair-bound refugee at your conference, then you may be doing something right. I mean, the whole wheelchair thing is really interesting because it’s so easy to just not realize. And I had been doing these conferences in edgy venues. And I remember walking with my sister, Saffron, to check out one of the potential venues. It was pretty cool, but when we were walking there, there were all these broken cobblestones, and there were quite a lot of heavy vehicles on the road next to it. And it was just very clear that for somebody that had either issues with walking or frankly, with their sight, it just wasn’t going to fly anymore. And I think doing the accessibility conference was a watershed for me because we had to think through so many things that we had not given enough attention vis-à-vis accessibility and inclusion.Corey: I think it’s also important to remember that if you’re organizing a conference and someone in a wheelchair shows up, you don’t want to ask that person to do extra work to help accommodate that person. You want to reach out to experts on this; take the burden on yourself. Don’t put additional labor on people who are already in a relatively challenging situation. I feel like it’s one of those basic things that people miss.James: Well, that’s exactly right. I mean, we offered basically, we were like, look, we will pay for your transport. Get a cab that is accessible. But when he was going to come along, we said, “Oh, don’t worry, we’ve made sure that everything is accessible.” We actually had to go further out of London. We went to the Olympic Park to run it that year because we’re so modern, and the investments they made for the Olympics, the accessibility was good from the tube, to the bus, and everything else. And the first day, he came along and he was like, “Oh, I got the cab because I didn’t really believe that the accessibility would work.” And I think on the second day, he just used the shuttle bus because he saw that the experience was good. So, I think that’s the thing; don’t make people do the work. It’s our job to do the work to make a better environment for as many people as possible.Corey: James, before we call it a show, I have to ask. Your Twitter name is @monkchips and it is one of the most frustrating things in the world trying to keep up with you because your Twitter username doesn’t change, but the name that goes above it changes on what appears to be a daily basis. I always felt weird asking you this in person, when I was in slapping distance, but now we’re on a podcast where you can’t possibly refuse to answer. What the hell is up with that?James: Well, I think if something can be changeable, if something can be mutable, then why not? It’s a weird thing with Twitter is that it enables that, and it’s just something fun. I know it can be sort of annoying to people. I used to mess around with my profile picture a lot; that was the thing that I really focused on. But recently, at least, I just—there are things that I find funny, or dumb, or interesting, and I’ll just make that my username. It’s not hugely intentional, but it is, I guess, a bit of a calling card. I like puns; it’s partly, you know, why you do something. Because you can, so I’ve been more consistent with my profile picture. If you keep changing both of them all the time, that’s probably suboptimal. Sounds good.Corey: Sounds good. It just makes it hard to track who exactly—“Who is this lunatic, and how did they get into my—oh, it’s James, again.” Ugh, branding is hard. At least you’re not changing your picture at the same time. That would just be unmanageable.James: Yeah, no, that’s what I’m saying. I think you’ve got to do—you can’t do both at the same time and maintain—Corey: At that point, you’re basically fleeing creditors.James: Well, that may have happened. Maybe that’s an issue for me.Corey: James, I want to thank you for taking as much time as you have to tolerate my slings, and arrows, and other various vocal devices. If people want to learn more about who you are, what you believe, what you’re up to, and how to find you. Where are you hiding?James: Yeah, I mean, I think you’ve said already, that was very kind: I am at @monkchips. I’m not on topic. I think as this conversation has shown, I [laugh] don’t think we’ve spoken as much about technology as perhaps we should, given the show is normally about the cloud.Corey: The show is normally about the business of cloud, and people stories are always better than technology stories because technology is always people.James: And so, yep, I’m all over the map; I can be annoying; I wear my heart on my sleeve. But I try and be kind as much as I can, and yeah, I tweet a lot. That’s the best place to find me. And definitely look at redmonk.com. But I have smart colleagues doing great work, and if you’re interested in developers and technology infrastructure, we’re a great place to come and learn about those things. And we’re very accessible. We love to talk to people, and if you want to get better at dealing with software developers, yeah, you should talk to us. We’re nice people and we’re ready to chat.Corey: Excellent. We will, of course, throw links to that in the [show notes 00:37:03]. James, thank you so much for taking the time to speak with me. I really do appreciate it.James: My pleasure. But you’ve made me feel like a nice person, which is a bit weird.Corey: I know, right? That’s okay. You can go for a walk. Shake it off.James: [laugh].Corey: It’ll be okay. James Governor, analyst and co-founder at RedMonk. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment in which you attempt to gatekeep being an industry analyst.Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.This has been a HumblePod production. Stay humble.
About SeanSean Kilgore is an Architect at Twilio, where he draws boxes, lighthouses and soapboxes. In Sean’s spare time, he enjoys reading, walking, gaming, and a well-made drink.Links: Twilio: https://www.twilio.com/ Silvia Botros's Twitter: https://twitter.com/dbsmasher Sean's Twitter: https://twitter.com/log1kal TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: Up next we’ve got the latest hits from Veem. Its climbing charts everywhere and soon its going to climb right into your heart. Here it is!Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Sean Kilgore who’s an architect at a small company called Twilio. Sean, welcome to the show.Sean: Corey, it’s a pleasure to be here.Corey: It really is. You’re one of those fun people that I always mean to catch up with and really do a deep dive in, but we keep passing like ships in the night. And in fact, I want to go back to more or less what is pretty damn close to your first real job in technology. You were a network administrator at Lutheran High School in Orange, California.Sean: I was.Corey: And at that same time, I was a network administrator down the street at Chapman University, also in Orange, California. And despite that, we have traveled in many of the same circles since, but we have never met in person despite copious opportunities to do so.Sean: That is amazing.Corey: Talking to you is like looking into a funhouse mirror of what would it have been like if I could, you know, hold down a job, and was actually good at things. It’s really fun. Apparently, I’d be able to grow a better beard.Sean: I don’t know about that. My beard is pretty patchy.Corey: Yeah, I look like an angry 14-year-old trying to prove a point to Mommy and Daddy. But that’s not really the direction that we need to take this in today. And you’ve done a lot of stuff that aligns with things that are near and dear to my heart. For the last—what is it now?—six years and change? Seven years and change? You’ve been at SendGrid, then Twilio through acquisition?Sean: Mm-hm.Corey: And you have done basically every operations-looking job at that company. You’ve had a bunch of titles. You wound up going from DevOps engineer to a team lead, then to a senior DevOps engineer again, and you call—you voluntarily move back to an individual contributor role. Let’s start there. What was management like?Sean: The management was interesting. My first go at that, I had no idea what I was doing, and so I didn’t know how to ask for the help that I needed. And so my wife and I refer to that time as the time that I played a lot of video games. Just, I wasn’t prepared for the emotional outlay that managing humans costs. And so I would end up spending my nights just playing video games trying to unwind and unpack from all that.I’ve managed twice, now. The second time was—it went much better. I knew more of what I was doing, I had more support. The manager of the team that had left, I had worked with that team very closely in the past, I’d been part of it. And so my whole purpose there was to make sure that we didn’t lose anybody after that until we found a new manager.And that actually worked out pretty well. I had to have some really difficult conversations with some people along the way, but they all stayed, they all told me that they really enjoyed me managing them. And I’ve had people ask me to manage them again after that, which was super bonkers.Corey: It’s always flattering when you have an impact as a manager and people seek you out to work with you again. I dabbled in management as well; no one ever asked me to do it twice, and I have effectively avoided managing people here at The Duckbill Group, just because my belief of what a good manager is—and I think it aligns with what you’ve already said—requires a certain selflessness and ability to focus on others and grow them, whereas my role here is very much as face of the company, and it’s about me. That’s not a recipe for a successful outcome for managing people and not having them rage-quit.Sean: Definitely.Corey: One thing that I find is interesting about management, the higher you rise in an organization, it’s counterintuitive but the more responsibility you get, but the less you can directly affect yourself. Your entire world becomes about effectively delegating work to others and about influence. In your case now, you are an architect, which means different things at different companies. So, I’m not entirely sure I know what it means at Twilio. So first, what is an architect at Twilio? And where does your responsibility start and stop, I guess is where I want to go, there?Sean: So, architecture at Twilio is kind of different. Some of the architects that I’ve worked with in the past is the ‘I design everything, and then engineers go and build the thing that I designed.’Corey: Oh, yes. The enterprise architect approach where I’m going to sit in my ivory tower and dispatch, effectively, winged monkeys to implement things that I don’t fully understand, but I have the flashy title. You’re saying that’s not what it is?Sean: There’s a fine distinction here because I think that some of the people I work with would definitely say that I am up in my ivory tower. It is more about—if I’m looking five years out—what capabilities my teams need to be able to provide to execute against a business strategy. That landscape is going to change immensely along the way, and so my job isn’t to say, “We’re going to use Kubernetes because that’s what we need to do five years out.” It happens to be what I’m saying right now, and I’m sure we’re going to go into that, Corey, but it’s less about this is how each of these things should be strung together to achieve that goal and more direction setting. So, I worked on something that I call the ‘Lighthouse,’ and it’s a vision of the future of where we want to go but the caveat is that if you actually go to the Lighthouse, it means you hit the rocks.It’s describing what I call the ‘Bay of Appropriate Futures,’ and you want to land somewhere in that Bay, but it’s not going to be the thing I wrote down five years before we get there. And so it’s much more of a technical leadership position, trying to help other technologists make good technology decisions. And so it’s more about making sure that all of the right questions get asked, not having all the right answers. That’s the difference between some architects that I’ve worked with in the past.Corey: And one of the challenges in that role is that you’re not managing people directly. So, what that means is, you are, on some level, not doing a lot of the hands-on keyboard implementation work yourself and, unless I dramatically misunderstand your corporate culture, you’re not empowered to unilaterally fire people, which means that you can only really lead via example and influence. Tell me about that.Sean: When people ask me if they want to be architects, I ask them if they can influence without authority, or if that’s even interesting to them. That is definitely the thing. And so when it comes down to, “Man, I really wish I could fire this person.” A, that never happens. But, B, it’s definitely about modeling behaviors. And there’s a bit of management here in that making expectations clear of senior engineers is part of my job, and helping them also be examples for other engineers is definitely a thing I get graded on.Corey: Influence without authority is sort of the definitional characteristic of being a consultant. It turns out, you can’t even force anything; it has to be the strength of your ideas combined, in some shops, and I admit I have a bit of a somewhat cynical view on this, but also the ability for the client to commit to their sunk-cost fallacy of, “Well, we paid a lot of money to hear this advice, we should probably do something with it.” And there’s always a story of making sure that you’re serving the organization with which you work, well. But when you can only influence rather than direct, it becomes a much more nuanced thing, and I feel like the single differentiator between success and failure in that role is, fundamentally, empathy. Am I wrong on that?Sean: Not at all. When I’m working with a very in-the-code engineer who comes to me and is trying to convince their team that they should do something, one of the things that’s a stumbling block for them is that they don’t realize that other people need to be influenced in ways that might be different than the way that they’re influenced. So, as an example, I work with a team of very senior people; I know that some people will respond well, if I site Accelerate, for example. And some people want to hear, “Well, Google does this, so it’s obvious that we should do that.” And trying to thread that needle with everyone in a way that gets everyone on board in the best way for all of us, when you can do that, you can be an architect.Corey: Some weeks, I feel like I’m closer to architect than I do others. It seems that the idea now—solutions architect being a job role that a lot of companies have they hurl out into the universe—is in many ways vastly misunderstood. You want to talk about some kind of architecture story, where I’m going to go ahead and design an architecture that solves some business problem on a whiteboard. That’s not hard to do.The hard part is then controlling for constraints such as, “Yeah, we already have a thing that doesn’t look anything like that, and we want to get it to that point. And oh, by the way, 18 months of downtime while we do that is not acceptable.” Nothing is ever truly greenfield. And adapting to constraints, and making compromises, and being realistic seems to separate the folks who are good at that from the folks who are playacting.Sean: There’s another thing in there where people who’ve worked on the same thing for a long time sometimes have a problem seeing where you could go. And so the constraints there can be really useful in designing things. A lot of people think that greenfield is awesome, but greenfield just means the possible outcomes are the entire universe. I like working in constraints, essentially.Corey: So, I want to talk to you a little bit about your tenure at Twilio, where you started off at SendGrid and then there was an acquisition, and everyone I know got super-quiet for a while because it turns out when there’s a pending acquisition, talking to people about it is frowned upon, and that goes doubly so in the context of someone who basically shitposts for a living. And I get that; I respect confidentiality, but I also don’t want people to jeopardize their own positions. So, it’s one of those, “Yeah, whatever you’re comfortable telling me, or not, is fine.” So, we didn’t talk for a while. And then the acquisition happened, and now you’re there. And you’ve been there at Twilio for a couple of years or so and haven’t rage-quit, so apparently, it’s worked. What was the transition like?Sean: The transition was really interesting. A lot of people were telling me that acquisitions were universally horrible. And that’s not how it worked. This is the first acquisition I’ve been through, so I have no context. People I trust told me that this one went well.So, in my role as architect, this acquisition was kind of interesting because SendGrid had a very robust, and we’ve done architecture for years. And Twilio’s architecture was a little bit different. It was more like, “There are some really, really senior people at Twilio who have seen some things, and you should probably ask them their opinion on stuff.” But there wasn’t really, like, an architecture review process. There was definitely, “A you need to write down a bunch of stuff and get some people to look at it,” but it wasn’t a, you need to get approved by your local architect or a group of architects.Part of that is to provide visibility across the org so that we’re not duplicating work and stuff. But Twilio basically adopted SendGrid’s architecture process, but it grew 10x. So, at SendGrid, we had, I think, six architects. At Twilio, we have, like, 40 now, so not quite 10x. But trying to copy and paste that process was kind of rough.We’re still, kind of, making that better. And then there were a couple things—as the acquired company, you kind of expect, I don’t know, maybe some housecleaning to happen. And that isn’t what happened. We saw a lot of like really senior leaders move into positions at Twilio of leadership. So, on day one, I think SendGrid’s sales leader became the sales leader for all of Twilio.And that sounded—I haven’t done this before, but it sounded like that’s not normal. And that’s happened in a couple different spots. It’s been pretty neat. And when I think about the acquisition, not just of acquiring another channel for Twilio, but kind of doing an acqui-hire of a bunch of key positions, that was a pretty valuable one.Corey: Let’s talk about one aspect of working at Twilio that I profoundly envy you for, which is working with one of the greatest people in the world: Silvia. Let’s talk about Silvia.Sean: She interviewed me at SendGrid. She’s been here almost a year longer than I have, and it’s been such a joy to work with her. Not just because everything gets Botros’ed around her, and so we have our own built-in chaos monkeys, but also, there’s no one that cares more about making sure that what teams are building won’t come back and bite the team later. She’s worked with, I think, maybe a 10th of the company now—and at Twilio, that’s a lot of teams—trying to just help them do better and make sure that the stuff that they’re building is not going to page them all the time, is actually going to serve the customer in ways that isn’t surprising to the customer. I can probably talk for half an hour about my appreciation for Silvia.Corey: Well, she was a great guest in the early days of this podcast. Silvia Botros is phenomenal. She has the Twitter handle of @dbsmasher so she’s my default go-to on misusing things as databases. And she also was just one of the most genuinely kind people I know.She also has an aura effect, where she is basically a walking EMP, and every time someone tries to show her a piece of technology, it explodes in novel and interesting ways, which, frankly, as an acceptance gate for technology is a terrific skill set to have. Does it cause problems in the office?Sean: Not normally. It causes more problems in the office when we are actually in an office together because Silvia, maybe predictively, is also a giant klutz. And so the joke is that she also EMPs herself. In the office, she does break things, but it’s never in an intended way. Or it’s just like a fun, “Oh, man, the WiFi’s down. It must be Silvia.”Corey: Exactly. It’s always nice to have someone you can blame for these things.Sean: It’s SOP.Corey: Yeah, oh, absolutely. At some point do you ever wind up missing things such as, “Oh, it’s probably just Silvia. No, it was actually a problem somewhere?”Sean: So, we actually determine that Silvia’s EMP works at a distance. She flew somewhere close to one of our hardware data centers, and at the time that she passed it, we had an outage. Like, the data center went dark, kind of thing. And so it still happens even if she’s not around. We’re pretty sure it’s not a local phenomenon.Corey: So, the thing that I know is probably going to sound completely boring and ridiculous to half the audience while the other half the audience sits and listens raptly; before I started this place, I never stayed anywhere for longer than two years, because as previously disclosed in multiple directions, I am a terrible employee. First, why did you stay at the same place for as long as you have, and what’s it like? And I’m really hoping you have an answer that isn’t just, “Oh, I have a complete lack of ambition,” because I won’t believe that for a second. But it is a tempting cop-out so let me just shut that down now.Sean: No, it’s more I’ve been here for almost eight years now, and I’ve never done the same thing. The fun fact that I tell people when they onboard or I’m interviewing them is, I think I’ve had more titles than anyone at Twilio. I’m up to 11, I think. And so 11 titles in eight years? It hasn’t been the same company.When I started, I was employee number 150. There were 80 of us when I started at SendGrid. I work at a company with 4500 people now and going through that growth, the company that I work for today, and even pre-acquisition, you would not recognize from the day I started at SendGrid. And so if I had been doing the same thing all the time, I wouldn’t still be here. There was a point before we started architecture at SendGrid, I was definitely in a spot kind of a rut, like, “Cool. I can continue to do the same thing over and over,” but I felt like there wasn’t a lot of growth to do.I needed to go see something else, kind of thing. I knew really well how to do our mail stuff, and I felt like I needed to broaden my horizons a bit, or I needed to level up. And at the time, the only place to go up was to management. And then we brought in a chief architect, J.R. Jasperson, and I I remember very clearly, it was like his third day or something, we had an all-company meeting—like a lunch thing—and I walked up to him and said, “You don’t know this yet, but you’re my new mentor because it seems like what you’re talking about is really interesting to me.” And he didn’t know it but the subtext there was like, “And if you don’t, I’m out.”And since then, the work that I do day-to-day is completely different. Like, I work for a platform org. This platform org is 130 people right now; it spans everything from building EC2 instances to, recently, it was, like, Twilio API Edge. There’s such a breadth in there that I never do the same thing every day.Corey: I really love installing, upgrading, and fixing security agents in my cloud estate! Why do I say that? Because I sell things, because I sell things for a company that deploys an agent, there's no other reason. Because let’s face it. Agents can be a real headache. Well, now Orca Security gives you a single tool that detects basically every risk in your cloud environment -- and that’s as easy to install and maintain as a smartphone app. It is agentless, or my intro would’ve gotten me into trouble here, but  it can still see deep into your AWS workloads, while guaranteeing 100% coverage. With Orca Security, there are no overlooked assets, no DevOps headaches, and believe me you will hear from those people if you cause them headaches. and no performance hits on live environments. Connect your first cloud account in minutes and see for yourself at orca.security. Thats “Orca” as in whale, “dot” security as in that things you company claims to care about but doesn’t until right after it really should have.Corey: That’s functionally, I think, the problem that I had in working in environments as a DevOps type because for the first three months in a job where I’m the first ops person, “Great everything’s on fire.”—I’m an adrenaline junkie in that sense—“Cool. Oh wow, all these problems that I know how to fix.” And then it gets to a reasonable level of working and now it’s just care and feeding of same. Okay, now I’m getting slightly bored, so let me look for other problems in other parts of the org.And that doesn’t go super well when you’re not welcome in those parts of the org which leads to a whole bunch of challenges I’ve had in my career. This is incidentally why being a consultant aligns so well with me and the way I approach things. It’s cool. I’m going to come in; I’m going to fix things, and then I get to leave. On day one, we know this is a time-bound engagement and that’s okay.Instead of going down the path of the lies everyone tells themselves where average tenure in this space is 18 to 24 months, but magically we’re all going to lie and pretend in the interview that this is their forever job and suddenly you’re going to stay here for 25 years and get a pension and a gold watch when you retire. And it’s, oh wow that’s amazing it sounds like everybody having these conversations wearing old-timey stovepipe hats. There’s just so much that isn’t realistic in those conversations. So, I talk to people who’ve been down those paths who’ve been at the same company for a decade or two, and the common failure mode there is that they have a year or so of experience that they repeat 10 or 20 times. And that’s sad; people get stuck. What you say absolutely resonates with me in that every year is a different thing that you’re working on. You’re not doing the same thing twice. I get antsy when too many days look the same, one to the next.Sean: I definitely hear that. If my every day was, come in join a stand-up, talk about the problem that I had last week and still have today, it wouldn’t work for me. I feel lucky that I work for an organization where outside input is actually requested and honored, so if I go to a team and just happened to have noticed something and say, “Hey, this right here you might want to take a look at. And I have some opinions here if you’d like to hear them.” I normally get asked for that opinion, and it normally turns out pretty good.There’s definitely times where it’s been, like, “No, Sean. You don’t know what you’re talking about.” And normally they’re right. It’s definitely not the same. People say you should be at a company for 18 to 24 months. And that’s true if your company is totally shortchanging you. When I ask my peers at other companies about, have I gotten stiffed by staying at the same place for this long? It’s definitely not. And if that wasn’t true, if Twilio was holding back my compensation, maybe this would have gone a different way, but it’s not what’s happened.Corey: Oh, true and to be clear that is very often the biggest criticism I have of people who stay at one company for a long time. They don’t realize what market rate is anymore and they find themselves in a scenario where, “Wow, I could go somewhere else and triple my salary,” which is not an exaggeration and an unpleasant discovery when people realize that they’ve been taken advantage of. And credit where due, I have had conversations with people at Twilio who have been there a long time. And I have never gotten the impression that that is what’s going on there. Your compensation is fair. I want to be very clear here. This is not one of those, “Oh, yeah, I’m just trying to be polite because someone’s being taken advantage of and doesn’t even know it.” No. They’re doing right by their employees. The fact that I have to call them out explicitly as an example of a rarity of a company doing right by its employees, is monstrous.Sean: It is. We’ve hired a few people recently where I found out that I think their pay was close to doubled just by coming here, and I just wish that it was more okay to call out their prior employees publicly and be like, “Cool. If you work for this company will probably pay you a ton more.”Corey: And that’s the other side of it, too, which I did early on in my career. It’s, “Oh, I’m leaving this company and screw you all.” “Well, why are you leaving?” “Oh, because I’m getting a 5% raise to change jobs.” I’m not saying that money should not factor into it, but at some point, when all is said and done at that scale, it works out to be 100 bucks a paycheck, or so, is it really worth changing for that? Maybe.If there are things you don’t like about the environment, please don’t let me dissuade people from interviewing for jobs. You always should be doing that, on some level, just so what the market looks like. But I’m also a big believer in, you don’t need to be as mercenary as I was early on in my career. A lot of it was shaped by environments—not Chapman, I want to be clear—that were not particularly kind to staff. And that I felt taken advantage of because I was. And as a result, “Oh, screw me? Screw you.” And it became a very mercenary approach that didn’t serve me well. That is now a baked-in aspect of how I view careers in some respects, and that is something of a problem that I wrestle with.Sean: The mercenary thing?Corey: Yeah. I wrestle with the mercenary thing just because when I talk to someone who’s having a challenge at work, or something, my default instinctive gut reaction that I’ve learned to suppress is, “Oh, screw ‘em. Quit and find another job.”Sean: Ah, gotcha.Corey: That’s not the most constructive way to work in the context of a company where you’re building a career trajectory, and a reputation, and you’ve been there for five years, and maybe rage-quitting because you didn’t wind up getting to pick the title of that presentation isn’t the best answer. I can be remarkably petty, for the reasons I’ll leave a company. But that’s not constructive, and I try very hard to avoid giving that advice unnecessarily to people.Sean: It’s definitely just, like, incidents, right? It’s never a root cause; it’s a contributing factor, and pay is just one contributing factor. I find that a lot of people, even if they’re being taken advantage of compensation-wise, they won’t leave unless there’s something else wrong.Corey: Yeah, compensation is absolutely a symptom, and in most cases, that’s not the real reason people are going to leave. I assure you, people who work at The Duckbill Group could make more money, objectively, somewhere else. But there’s a question of what people value. We pay people well, but we don’t offer FAANG money, with the equity upside and the rest. We’re not trying to pull the Netflix and pay absolute top-of-market in all cases to all people.I would love to be able to do that; our margins don’t yet support that. Thanks to our sponsors, we’re going to continue to ratchet those prices way up. I kid. I kid. But there are business reasons why things are the way they are. What we do offer instead is things that contribute to a workplace we want to work at. More of us are parents than aren’t.We don’t expect people to work outside of business hours in almost any scenario, short of, you know, re:Invent or something. There’s a very human approach to it. We’re not VC-backed at all, so we don’t ever have to worry about trying to sprint to hit milestones over debt as a company. We have this insane secret approach called ‘revenue’ and ‘profitability’ that means we can continue to iterate month to month, and as long as the trend line continues, we’re happy.Sean: That kind of sustainability is awesome, and is a really good indicator that a company is going to be successful, to me. Especially smaller companies; the decision to not take VC money. And to chase sustainable revenue growth, I know everyone wants to chase the hockey sticks, but at what cost?Corey: Yeah. And I think that people put this on job-seekers way too much. I have been confronted, at one point—I will not name the company—when I was interviewing years ago, and I was asked by the hiring manager, “Well, it seems like you’ve done a fair bit of job-hopping in the course of your career. What’s up with that?” And they pulled up my LinkedIn profile and went through it, and I said, “You realize most of those were contract gigs?” “Well, I don’t kno—oh, yeah. I guess it was. Oh, that was—huh. I guess so.”So, it was a failed attempt to call bullshit on my job history. And because I don’t take things like that well, I turned it around right back on him, and I said, “No, I appreciate that. Thank you for clarifying.” That’s a warning sign is when I thank you for insulting me because what’s coming next is always going to sting. But while we’re on the topic of turnover, “Your team has lost 80% of its members in the last six months. What’s going on with that? Is there a problem here that I should be aware of?” And suddenly, the back-peddling was phenomenal. I did get an offer from that company; I did not accept it.Sean: Good.Corey: You can tell a lot about a company by how they buy their people. And if you’re actively being insulted or hazed in the job interview process, no. I want people who I choose not to hire, to come away from the experience feeling respected and that they enjoyed the experience to the point where they would say nice things about us if asked, or even evangelize us without ever even having to be asked. And so far, we’ve done that because we’re very intentional on how we approach things. And man, am I tired of people doing this badly.Sean: When I interview someone, I want them to leave, and then if they don’t take the job, I want it to be because it wasn’t the right job for them. Or, like, the team wasn’t the right fit, not because anything happened in the interview process that was a red flag. That’s the worst. I want Twilio to be a spot where there are no red flags. That would be ideal.Corey: Absolutely. I think that so many folks get it wrong, where there’s this idea of, “Oh, I’m going to interview you. And oh, you’re an ops person. Great. I want you to implement Quicksort on the whiteboard.” And it’s, “Question one: do you do that a lot here? And two: no, of course you don’t because I’ve seen your services list. There’s no rhyme or reason to the order it appears in. Maybe someone should implement Quicksort in production.”And then there’s the other side, too, of, “Oh, great. There’s this broad skill set across the entire space. I’m going to figure out where you’re weak and then needle you on those.” I don’t like hiring for absence of weakness; I like hiring for, you’re really good at things we need here and you’re acceptable at the things that are non-negotiable, and able to improve in areas where it becomes helpful.Sean: Yeah. The best interview process I ever had, they flew me up to San Francisco, I worked with them for a day on a real problem that they had, like, pair programming. They offered me the job—it didn’t work out because I didn’t want to move to San Francisco, it turned out—but that interview process was super valuable to me as the candidate because I found out exactly how a day at that job would work; what it would look like.Corey: I had a very similar experience once and the cherry on the top was they paid me a nominal contracting rate for the day—Sean: Same.Corey: —because it was touching things that they were doing. And I think that that’s another anti-pattern of, that was a thing that also just happens to be a thing we’re going to use in production, but we’re not going to tell you that we’re not going to compensate you for it. I’ll work on toy problems; not production in an interview context.Sean: I wanted to circle back to one thing about leaving a company, like, rage-quitting. It’s essentially—if you rage-quit because of a problem, like, a small thing, you’re missing an opportunity to grow. And especially if I had one superpower, I would say that it’s probably managing up. Part of this is just, I have a lot of privilege that lets me do that, but it is definitely a skill that I wish more people had for their own careers.Corey: I really do, too. We spent all this time practicing how to be a candidate in a job interview, and almost no time training people how to be a good interviewer, and what you’re looking for. And you wind up with terrible things like, “I had this problem once in production that I thought was super clever, so I’m going to set it up for you and see how you would solve that problem. And if you don’t follow the exact same path that I did, then we’re going to go ahead and just keep shooting down anything else you suggest.” No, stop it.Sean: I do a lot of interviewing, and so I love when I learn something from a candidate because I can ask them questions that are like, “How did you figure this out? How did you even notice that this was a problem?”s and you get to go really deep in something they know, the way they know it. We used to do the, like, “Build us an LRU cache in the best big-O notation time.” And if you didn’t get it, you didn’t get the job, if you did it in slower than optimum time.And I remember leaving one of the interviews and doing the recap, and it was like, if anyone came to work and did this, I would be upset at them for wasting time. This is part of the standard library of all the things that we do. Why are we asking this question? I know for a while we stopped asking the question, which is great. I don’t do a lot of code interviews at Twilio, so I don’t know if we do something similar there, yet. I should go find out.Corey: I do not know either way, to be clear. None of the stories I’m talking about involved Twilio. Though I will say, I went on an interview years ago at SendGrid in Anaheim, and I don’t know if I ever got a formal rejection or not afterwards, but regardless, they did not opt to hire me. In hindsight, good decision.Sean: I wonder if we were in the office at the same time.Corey: It would have been 2006, so I think it might have been a bit before your time.Sean: That was before my time.Corey: And very much, credit where due, I started my career in large-scale email systems, so SendGrid was one of those. Oh, I could probably apply the skill set there. The problem, of course, was that it became pretty apparent, even in those days that eventually there weren’t going to be that many companies that needed that skillset. The days of an email admin in every company were drawn to a close, and it was time to evolve or die.Sean: You’re welcome.Corey: Of course. And again SendGrid today, under the hood—deep under the hood—does still power Last Week in AWS. You folks send emails and get them where they need to go, for which I thank you, and the rest of the world probably does not most weeks.Sean: [laugh]. Yeah.Corey: Ugh. So, we’ve covered a lot of wide-ranging topics. If people want to hear more about who you are, and what you have to say, where can they find you?Sean: I’m on Twitter at @log1kal with a one and a K because I hate people who want to find me, apparently. But that’s @log1kal. Twitter’s probably the only thing.Corey: Excellent. We’ll, of course, put a link to that in the [show notes 00:30:39]. Sean, thank you so much for speaking with me today. I really appreciate it.Sean: Thank you so much, Corey. I love having these kinds of conversations. I love that there is no plan; we’re just going to have a conversation and record it. I love listening to these kinds of podcasts.Corey: Well, I like creating these kinds of podcasts because the other ones take way too much work.Sean: [laugh].Corey: Sean Kilgore, architect at Twilio. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment explaining how it is almost certainly the fault of Silvia Botros’s aura.Sean: [laugh].Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.This has been a HumblePod production. Stay humble.
About NigelNigel Kersten’s day job is Field CTO at Puppet where he leads a group of engineers who work with Puppet’s largest customers on cultural and organizational changes necessary for large-scale DevOps implementations - among other things. He’s a co-author of the industry-leading State Of DevOps Report and likes to evenly talk about what went right with DevOps and what went wrong based on this research and his experience in the field. He’s held multiple positions at Puppet across product and engineering and came to Puppet from the Google SRE organization, where he was responsible for one of the largest Puppet deployments in the world.  Nigel is passionate about behavioral economics, electronic music, synthesizers, and Test cricket. Ask him about late-stage capitalism, and shoes.Links: Puppet: https://puppet.com 2020 State of DevOps Report: https://puppet.com/resources/report/2020-state-of-devops-report/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. This promoted episode is sponsored by a long time… I wouldn’t even say friends so much as antagonist slash protagonist slash symbiotic company with things I have done as I have staggered through the ecosystem. There’s a lot of fingers of blame that I can point throughout the course of my career at different instances, different companies, different clients, et cetera, et cetera, that have shaped me into the monstrosity than I am today. But far and away, the company that has the most impact on the way that I speak publicly, is Puppet.Here to accept the recrimination for what I become and how it’s played out is Nigel Kersten, a field CTO at Puppet—or the field CTO; I don’t know how many of them they have. Nigel, welcome to the show, and how unique are you?Nigel: Thank you, Corey. Well, I—you know, reasonably unique. I think that you get used to being one of the few Australians living in Portland who’s decided to move away from the sunny beaches and live in the gray wilderness of the Pacific Northwest.Corey: So, to give a little context into that ridiculous intro, I was a traveling contract trainer for the Puppet fundamentals course for an entire summer back in I want to say 2014, but don’t hold me to it. And it turns out that when you’re teaching a whole bunch of students who have paid in many cases, a couple thousand dollars out of pocket to learn a new software where, in some cases, they feel like it’s taking their job away because they view their job, rightly or wrongly, is writing the same script again and again. And then the demo breaks and people are angry, and if you don’t get a good enough rating, you’re not invited to continue, and then the company you’re contracting through hits you with a stick, it teaches you to improvise super quickly. So, I wasn’t kidding when I said that Puppet was in many ways responsible for the way that I give talks now. So, what do you have to say for yourself?Nigel: Well, I have to say, congratulations for surviving, opinionated defensive nerds who think not only you but your entire product you’re demoing could be replaced by a shell script. It’s a tough crowd.Corey: It was an experience. And some of these were community-based, and some of them were internal to a specific company. And if people have heard more than one episode of this show, I’m sure they can imagine how that went. I gave a training at Comcast once and set a personal challenge for myself of how many times could I use the word ‘comcastic’ in a three-day training. And I would work it in and talk about things like the schedule parameter in Puppet where it doesn’t guarantee something’s going to execute in a time window; it’s the only time it may happen.If it doesn’t fire off, and then it isn’t going to happen. It’s like a Comcast service appointment. And then they just all kind of stared at me for a while and, credit where due, that was the best user rating I ever got from people sitting through one of my training. So, thanks for teaching me how to improve at, basically, could have been a very expensive mistake on Puppet’s part. It accidentally worked out for everyone.Nigel: Brilliant, brilliant. Yes, you would have survived teaching the spaceship operator to that sort of a crowd.Corey: Oh, I mostly avoided that thing. That was an advanced Puppet-ism, and this was Puppet fundamentals because I just need to be topically good at things, not deep-dive good at things. But let’s dig into that a little bit. For those who have not had the pleasure of working with Puppet, what is it?Nigel: Sure? So, Puppet is a pretty simple DSL. You know, DSLs aren’t necessarily in favor these days.Corey: Domain-specific language, for those who have not—Nigel: Yep.Corey: —caught up on that acronym. Yes.Nigel: So, a programming language designed for a specific task. And, you know, instead, we’ve decided that the world will rest on YAML. And we’ve absorbed a fair bit of YAML into our ecosystem, but there are things that I will still stand by are just better to do in a programming language. ‘if x then y,’ for example, it’s just easier to express when you have actual syntax around you and you’re not, sort of, forcing everything to be in a data specification language. So, Puppet’s pretty simple in that it’s a language that lets you describe the state that infrastructure should be.And you can do this in a modular and composable way. So, I can build a little chunk of automation code; hand it to Corey; Corey can build something slightly bigger with it; hand it to someone else. And really, this sort of collaboration is one of the reasons why Puppet’s, sort of, being at the center of the DevOps movement, which at its core is not really about tools. It’s about reducing friction between different groups.Corey: Back when I was doing my traveling training shtick, I found that I had to figure out a way to describe what Puppet did to folks who were not deep in the space, and the analogy that I came up with that I was particularly partial to was, imagine you get a brand new laptop. Well, what do you do with it? You install your user account and go through the setup; you install the programs that you use, some which have licenses on it; you copy your data onto it; you make sure that certain programs always run on startup because that’s the way that you work with these things; you install Firefox because that’s the browser of choice that you go with, et cetera, et cetera. Now, imagine having to do that for, instead of one computer, a thousand of them, and instead of a laptop, they’re servers. And that is directionally what Puppet does.Nigel: Absolutely. This is the one I use for my mother as well. Like, I was working around Puppet for years before—and the way I explained it was, “You know when you get a new iPad, you’ve got to set up your Facebook account and your email. Imagine you had ten thousand of these.” And she was like—I was like, “You know, companies like Google, company like big banks, they all have lots and lots and lots of computers.” And she was like, “They run all those things on iPads.” And I was like, “This is not really where my analogy was going.” But.Corey: Right. And increasingly, though, it seems like the world has shifted in some direction where, when you explain that to your mother and she comes back with, “Well, wouldn’t they just put the application into Docker and be done with it?” Oh, dear. But that seems to be in many ways that the direction that the zeitgeist has moved in, whether or not that is the reality in many environments, where when you’re just deploying containers everywhere—through the miracle of Kubernetes—if you’ll pardon the dismissive scorn there, that you just package up your application, shove into a container, and then hurl it from the application team over the operations team, like a dead dog cast into your neighbor’s yard for him to worry about. And then it sort of takes up the space of you don’t have to manage state anymore because everything is mostly stateless in theory. How have you seen it play out in practice in the last five years?Nigel: I mean, that’s a real trend. And, you know, the size of a container should be [laugh] smaller than an operating system. And the reality is, I’m a sysadmin; I love operating systems, I nerded out on operating systems. They’re a necessary evil, they’re terrible, terrible things: registry keys, config files, they’re a pain in the neck to deal with. And if you look at, I think what a lot of operations folks missed about Docker when it started was that it didn’t make their life better. It was worse.It was, like, this actual, sort of, terrible toolchain where you sort of tied together all these different things. But really importantly, what it did is it put control into the hands of the developers, and it was the developers who were trying to do stuff who were trying to shift into applications. And I think Docker was a really great technology, in the sense of, you know, developers could ship value on their own. And that was the huge, huge leveling up. It wasn’t the interface, it wasn’t the user experience, it wasn’t all these things, it was just that the control got taken away from the IT trolls in their basement going, “No, don’t touch my servers,” and instead given straight to the developers. And that’s huge because it let us ship things faster. And that’s ultimately the whole goal of things.Corey: The thing that really struck me the most from conducting the trainings that I did was meeting a whole bunch of people across the country, in different technological areas of specialty, in different states of their evolution as technologists, and something that struck me was just how much people wound up identifying with the technology that they worked on. When someone is the AIX admin, and the AIX machines are getting replaced with Linux boxes, there’s this tendency to fight against that and rebel, rather than learning Linux. And I get it; I’m as subject to this as anyone is. And in many cases, that was the actual pushback that I saw against adopting something like Puppet. If I identify my job as being the person that runs all these carefully curated scripts that I’ve spent five years building, and now that all gets replaced with something that is more of a global solution to my local problem, then it feels like a thing that made me special is eroding.And we see that with the migration to cloud as well. When you’re the storage admin, and it just becomes an API call to S3, that’s kind of a scary thing. And when you’re one of the server hugger types—and again, as guilty as anyone of this—and you start to see cloud coming in as, like, a rising tide that eats up what it was that you became known for, it’s scary and it becomes a foundational shift in how you view yourself. What I really had a lot of sympathy for was the folks who’ve been doing this for 20 years. They were, in some cases, a few years away from retirement, and they’ve been doing basically the same set of tasks every year for 25 years.It’s one year of experience repeated 25 times. And they don’t have that much time left in their career, intentionally, so they want to retire, but they also don’t really want to learn a whole bunch of new technologies just to get through those last few years. I feel for them. But at the same time—Nigel: No, me too, totally. But what are you going to do? But without sounding too dismissive there, I think it’s a natural tendency for us to identify with the technology if that’s what you’re around all the time. You know, mechanics do this, truck drivers with brands of trucks, people, like, to build attachments to the technology they work with because we fit them into this bigger techno-social system. But I have a lot of empathy for the people in enterprise jobs who are being asked to change radically because the cycle of progress is speeding up faster and faster.And as you say, they might be a few years away from retirement. I think I used to feel more differently about this when I was really hot-headed and much more of a tech enthusiast, and that’s what I identified with. In terms of, it’s okay for a job to just be a job for people. It’s okay for someone to be doing a job because they get good health care and good benefits and it’s feeding their family. That’s an important thing. You can’t expect everyone to always be incredibly passionate about technology choices in the same way that I think many of us who live on Twitter and hanging out in this space are.Corey: Oh, I have no problem whatsoever with people who want to show up for 40 hours a week-ish, work on their job, and then go home and have lives and not think about computers at all. There’s this dark mass of developers out there that basically never show up on Twitter, they aren’t on IRC, they don’t go to conferences, and that’s fine. I have no problem with that, and I hope I don’t come across as being overly dismissive of those folks. I honestly wish I could be content like that. I just don’t hold still very well.Nigel: [laugh]. Yeah, so I think you touched on a few interesting things there. And some of those we sort of cover in the State of DevOps Report, which is coming out in the next few weeks.Corey: Indeed, and the State of DevOps Report started off at Puppet, and they’ve now done it for, what, 10 years?Nigel: This is the 10th year, which is completely crazy. So, I was looking at the stats as I was writing it, and it’s 10 years of State of DevOps Reports; I think it’s 11 years of DevOps Weekly, Gareth Rushgrove’s newsletter; it’s 12 or 13 years of DevOpsDays that have been going on. This is longer than I spent in primary and high school put together. It’s kind of crazy that the DevOps movement is still, kind of, chugging along, even if it’s not necessarily the coolest kid on the block, now that GitOps, SRE flavor of the month, various kinds of permutations of how we work with technology, have perhaps got a little bit cooler. But it’s still very, very relevant to a lot of enterprises out there.Corey: Yeah. As I frequently say, legacy is a condescending engineering term for ‘it makes money,’ and there’s an awful lot of that out there. Forget cloud, there are still companies wrestling with do we explore this virtualization thing? And that was something I was very against back in 2006, let’s be very honest. I am very bad at predicting the future of technology.And, “I can see this for small niche edge workload cases, where you have a bunch of idle servers, but for the most part, who’s really going to use this in production?” Well, basically everyone because that, in turn, is what the cloud runs on. Yeah, I think we can safely say I got that one hilariously wrong. But hey, if you’re aren’t going to make predictions, then what’s it matter?Nigel: But the industry pushes you in these directions. So, there was this massive bank in Asia who I’ve been working with for a long time and they were always resistant to adopting virtualization. And then it was only four or five years ago that I visited them; they’re like, “Right. Okay. It’s time. We’re rolling out VMware.” And I was like, “So, I’m really curious. What exactly changed in the last year or two in, like, 2014, 2015 that you decided virtualization was the key?” And I’m like—Corey: Oh, there was this jackwagon who conducted this training? Yeah, no, no, sorry. I can’t take credit for that one.Nigel: They couldn’t order one rack unit servers with CD drives anymore because their whole process was actually provisioning with CDs before that point.Corey: Welcome to the brave new world of PXE booting, which is kind of hard, so yeah, virtualization is easier. You know, sometimes people have to be dragged into various ways of technological advancement. Which gets to the real thing I want to cover, since this is a promoted episode, where you’re talking about the State of DevOps Report, I’m almost less interested in what this year’s has to say specifically, than what you’ve seen over the last decade. What’s changed? What was true 10 years ago that is very much not true now? Bonus points if you can answer that without using the word Kubernetes more than twice.Nigel: So, I think one of the big things was the—we’ve definitely passed peak DevOps team, if you may remember, there was a lot of arguments and there’s still regular, is DevOps a job title? Is it a team title? Is it a [crosstalk 00:14:33]—Corey: Oh, I was much on the no side until I saw how much more I would get paid as a DevOps engineer instead of a systems administrator for the exact same job. So, you know, I shut up and I took the money. I figured that the semantic arguments are great, but yeah.Nigel: And that’s exactly what we’ve written in the report. And I think it’s great. The sysadmins, we were unloved. You know, we were in the basement, we weren’t paid as much as programmers. The running joke used to be for developers, DevOps meant, “I don’t need ops anymore.” But for ops people, it was, “I can get paid like a developer.”Corey: In many cases, “Oh, well, systems administrators don’t want to learn how to code.” It’s, yeah, you’re remembering a relatively narrow slice of time between the modern era, where systems administrator types need to be able to write in the lingua franca of everything—which is, of course, YAML, as far as programming languages go—and before that, to be a competent systems administrator, you needed to have a functional grasp of C. And—Nigel: Yeah.Corey: —there is only a limited window in which a bunch of bash scripts and maybe a smidgen of Perl would have carried you through. But the deeper understanding is absolutely necessary, and I would argue, always has been.Nigel: And this is great because you’ve just linked up with one of the things we found really interesting about the report is that you know when we talk about legacy we don’t actually mean the oldest shit. Because the oldest shit is the mainframes; it’s a lot of bare metal applications. A lot of that in big enterprises—Corey: We’re still waiting for an AWS/400 to replace some of that.Nigel: Well, it’s administered by real systems engineers, you know, like, the people who wrote C, who wrote kernel extensions, who could debug things. What we actually mean by legacy is we mean late ’90s to late 2000s, early 2010s. Stuff that was put together by kids who, like me, happened to get a job because you grew up with a computer, and then the dotcom explosion happened. You weren’t necessarily particularly skilled, and a lot of people, they didn’t go through the apprenticeships that mainframe folks and systems engineers actually went through. And everyone just held this stuff together with, you know, duct tape and dental floss. And then now we’re paying the price of it all, like, way back down the track. So, the legacy is really just a certain slice of rapid growth in applications and infrastructure, that’s sort of an unmanageable mess now.Corey: Oh, here in San Francisco, legacy is anything prior to last night’s nightly build. It’s turned into something a little ridiculous. I feel like the real power move as a developer now is to get a job, go in on day one, rebase everything in the Git repository to a single commit with a message, ‘legacy code’ and then force push it to the main branch. And that’s the power move, and that’s how it works, and that’s also the attitude we wind up encountering in a lot of places. And I don’t think it serves anyone particularly well to tie themselves so tightly to that particular vision.Nigel: Yep, absolutely. This is a real problem in this space. And one of the things we found in the State of DevOps Report is that—let me back up a little and give a little bit of methodology of what we actually do. We survey people about their performance metrics, you know, like how quickly can you do deploys? What’s your mean time to recovery? Those sorts of things, and what practices do you actually employ?And we essentially go through and do statistical analysis on this, and everyone tends to end up in three cohorts, they separate pretty easily, of low, medium, and high evolution. And so one of the things we found is that everyone at the low level has all sorts of problems. They have issues with what does my team do? What does the team next to me do? How do I talk to the team next to me?How do I actually share anything? How do I even know what my goals are? Like, fundamental company problems. But everyone at all levels of evolution is stuck on two big things: not being able to find enough people with the right skills for what they need, and their legacy infrastructure holding them back.Corey: The thing that I find the most compelling is the idea of not being able to find enough people with the skills that they need. And I’m going to break my own rule and mentioned Kubernetes as a prime example of this. If you are effective at managing Kubernetes in production, you will make a very comfortable living in any geographical location on the planet because it is incredibly complex. And every time we’ve seen this in previous trends, where you need to get more and more complexity, and more and more expertise just to run something, it looks like a sawtooth curve, where at some point that complexity, it gets abstracted away and compressed down into something that is basically a single line somewhere, or it happens below the surface level of awareness. My argument has been that Kubernetes is something no one’s going to care about in roughly three years from now, not because we’re not using it anymore, but because it’s below the level of awareness that we have to think about, in the same way that there aren’t a whole lot of people on the planet these days who have to think about the Linux virtual memory management subsystem. It’s there and a few people really care about it, but for the rest of us, we don’t have to think about that. That is the infrastructure underneath our infrastructure.Nigel: Absolutely. I used to make a living—and it’s ridiculous looking back at this—for a year or two, doing high-performance custom compiled Apaches for people. Like, I was really really good at this.Corey: Well yeah, Apache is a great example of this, where back in the ’90s, to get a web server up and running you needed to have three days to spare, an in-depth knowledge of GCC compiler flags, and hope for the best. And then RPM came out and then, okay, then YUM or other things like that—Nigel: Exactly.Corey: —on top of it. And then things like Puppet started showing up, and we saw, all right now, [unintelligible 00:20:01] installed. Great. And then we had—it took a step beyond that, and it was, “Oh, now it’s just a Docker-run whatever it is,” and these days, yeah, it’s a checkbox in S3.Nigel: So, let me get your Kubernetes prediction down, right. So, you’re predicting Kubernetes is going to go away like Apache and highly successful things. It’s not an OpenStack failure state; it’s Apache invisibility state?Corey: Absolutely. My timeline is a bit questionable, let’s be fair, but—it’s a little on the aggressive side, but yeah, I think that Kubernetes is inherently too complex for most people to have to wind up thinking about it in that way. And we’re not talking small companies; we’re talking big ones where you’re not in a position, if you’re a giant blue-chip Fortune 50, to hire 2000 people who all know Kubernetes super well, and you shouldn’t have to. There needs to be some flattening of all of that high level of complexity. Without the management tools, though, with things like Puppet and the things that came before and a bunch of different ways, we would all not be able to get anything done because we’d be too busy writing in assembly. There’s always going to be those abstractions on top abstractions on top abstractions, and very few people understand how it works all the way down. But that’s, in many cases, okay.Nigel: That’s civilization, you know? Do you understand what happens when you plug in something to your electricity socket? I don’t want to know; I just want light.Corey: And more to the point, whenever you flip the switch, you don’t have that doubt in your mind that the light is going to come on. So, if it doesn’t, that’s notable, and your first thought is, “Oh, the light bulb is out,” not, “The utility company is down.” And we talk about the cloud being utility computing.Nigel: Has someone put a Kubernetes operator in this light switch that may break this process?Corey: Well, okay, IoT does throw a little bit of a crimp into those works. But yeah. So, let’s talk more about the State of DevOps Report. What notable findings were there this year?Nigel: So, one of the big things that we’ve seen for the last couple of years has been that most companies are stuck in the middle of the evolutionary progress. And anyone who deals with large enterprises knows this is true. Whatever they’ve adopted in terms of technology, in terms of working methods, you know, agile, various different things, most companies don’t tend to advance to the high levels; most places stay mired in mediocrity. So, we wanted to dive into that and try and work out why most companies actually stuck like this when they hit a certain size. And it turns out, the problems aren’t technology or DevOps, they really fundamental problems like, “We don’t have clear goals. I don’t understand what the teams next to me do.”We did a bunch of qualitative interviews as well as the quantitative work in the survey with this report, and we talked to one group of folks at a pretty large financial services company who are like, “Our teams have all been renamed so many times, if I need to go and ask someone for something, I literally page up and down through ServiceNow, trying to find out where to put the change request.” And they’re like, “How do I know where to put a network port opening request for this particular service when there are 20 different teams that might be named the right thing, and some are obsolete, and I get no feedback whether I’ve sent it off to the right thing or to a black hole of enterprise despair?”Corey: I really love installing, upgrading, and fixing security agents in my cloud estate! Why do I say that? Because I sell things, because I sell things for a company that deploys an agent, there's no other reason. Because let’s face it. Agents can be a real headache. Well, now Orca Security gives you a single tool that detects basically every risk in your cloud environment -- and that’s as easy to install and maintain as a smartphone app. It is agentless, or my intro would’ve gotten me into trouble here, but  it can still see deep into your AWS workloads, while guaranteeing 100% coverage. With Orca Security, there are no overlooked assets, no DevOps headaches, and believe me you will hear from those people if you cause them headaches. and no performance hits on live environments. Connect your first cloud account in minutes and see for yourself at orca.security. Thats “Orca” as in whale, “dot” security as in that things you company claims to care about but doesn’t until right after it really should have.Corey: That doesn’t get better with a lot of modernization. I mean, I feel like half of my job—and I’m not exaggerating—is introducing Amazonians to one another. Corporate communication between departments and different groups is very far from a solved problem. I think the tooling can help but I’ve never been a big believer in solving political problems with technology. It doesn’t work. People don’t work that way.Nigel: Absolutely. One of my earliest times working at Puppet doing, sort of, higher-level sales and services and support, huge national telco walk in there; we’ve got the development team, the QA team, the infrastructure team. In the course of this conversation, one of them makes a comment about using apt-get, and the others were like, “What do you mean? We’re on RHEL.” And it turned out, production was running on RHEL, the QA team running on CentOS and the developers were all building everything on Ubuntu. And because it was Java wraps, they almost didn’t have to care. But write once, debug everywhere.Corey: History doesn’t repeat, but it rhymes; before Docker, so much of development in startup-land was how do I make my MacBook Pro look a lot more like an EC2 Linux instance? And it turns out that there’s an awful lot of work that goes into that maybe isn’t the best use of people’s time. And we start to see these breakthroughs and these revelations in a bunch of different ways. I have to ask. This is the tenth year that you’ve done the State of DevOps Report. At this point, why keep doing it? Is it inertia? Are you still discovering new insights every year on top of it? Or is it one of those things where well someone in marketing says we have to do it, so here we are?Nigel: No, actually, it’s not that at all. So definitely, we’re going to take stock after this year because ten years feels like a really good point to, sort of—it’s a nice round number in certain kind of number system. Mainly the reason is, a lot of my job is going and helping big enterprises just get better at using technology. And it’s funny how often I just get folks going, “Oh, I read this thing,” like people who aren’t on the bleeding edge, constantly discussing these things on Twitter or whatever, but the State of DevOps Report makes its way to them, and they’re like, “Oh, I read a thing there about how much better it is if we standardized on one operating system. And that made a really huge difference to what we were actually doing because you had all this data in there showing that that is better.”And honestly, that’s the biggest reason why I ended up doing it. It’s the fact that it seems to be a tool that has made its way through to very hard to penetrate enterprise folks. And they’ll read it and managers will read things that are like, “If you set clear goals for your team and get them to focus on optimizing the legacy environment, you will see returns on it.” And I’m being a little bit facetious in the tone that I’m saying because a lot of this stuff does feel obvious if you’re constantly swimming in this stuff day-to-day, but it’s not just the practitioners who it’s just a job for in a lot of big companies. It’s true, a lot of the management chain as well. They’re not necessarily going out and reading up on modern agile IT management practices day-to-day, for fun; they go home and do something else.Corey: One of my favorite conferences is Gene Kim’s DevOps Enterprise Summit, and the specific reason behind that is, these are very large companies that go beyond companies, in some cases, to institutions, where you have the US Air Force as a presenter one year and very large banks that are 200 years old. And every other conference, it seems, more or less involves people getting on stage, deliver conference-ware and tell stories that make people at those companies feel bad about themselves. Where it’s, “We’re Twitter for Pets, and this is how we deploy software,” or the ever-popular, “This is how Netflix does stuff.” Yeah, Netflix has basically no budget constraints as far as hiring engineering folks go, and lest we forget, their failure mode is someone can’t watch a movie right now. It’s not exactly the same thing as the ATM starts spitting out the wrong balance in the streets.And I think that there’s an awful lot of discussion where people look at the stories people tell on conference stages and come away feeling bad from it. Very often, I’ll see someone from a notable tech companies talk about how they do things. And, “Wow, I wish my group did things like that.” And the person next to me says, “Yeah, me too.” And I check and they work at the same company.And the stories we tell are not necessarily the stories that we live. And it’s very easy to come away discouraged from these things. And that goes triply so for large enterprises that are regulated, that have significant downside risk if the technology fails them. And I love watching people getting a chance to tell those stories.Nigel: Let me jump in on that really quickly because—Corey: Please, by all means.Nigel: —one is, you know, having done four years at Google, things are a shitshow internally there, too—Corey: You’re talking about it like it’s prison. I like it.Nigel: —you know. [laugh]. People get horrified when they turn up and they’re like, “Oh, what it’s not all gleaming, perfect software artifacts, delivered from the hand of Urs.” But I think what Gene has done with DevOps Enterprise Summit is fantastic in how people share more openly their failure states, but even there—and this is an interesting result we found from a few years ago, State of DevOps Report—even those executives are being more optimistic because it’s so beaten into you as the senior executive; you’re putting on a public face, and even when they’re trying to share the warts-and-all story, they can’t help but put a little bit of a positive spin on it. Because I’ve had exactly the same experience there where someone’s up there telling a war story, and then I look, turn to the person next to me, and they work at that same 300-year-old bank, and they’re like, “Actually, it’s much, much worse than this, and we didn’t fix it quite as well as that.” So, I think the big tech companies have terrible inside unless they’re Netflix, and the big enterprises are also terrible. But they’re also—Corey: No, no, I’ve talked to Netflix people, too. They do terrible things internally there, too. No one talks about the fact that their internal environments are always tire fires, and there are two stories: the stories we tell publicly, and the reality. And if you don’t believe me on that, look at any company in the world’s billing system. As much as we all talk about agile and various implementations thereof when it comes to things that charge customers money, we’re all doing waterfall.Nigel: Absolutely. [laugh].Corey: Because mistakes show when you triple-charge someone’s credit card for the cost of a small country’s GDP. It’s a problem. I want to normalize those sorts of things more. I’m looking forward to reading this year’s report, just because it’s interesting to see how folks who are in environments that differ from the ones that I get to see experience in this stuff and how they talk about it.Nigel: Yeah. And so one of the big results I think there for big companies that’s really interesting is that one of the, sort of, anti-patterns is having lots of different types of teams. And I kind of touched on this before about having confusing team titles being a real problem. And not being able to cross organizational boundaries quickly is really, really—you know, it’s a huge inhibitor and cause, source of friction. But turns out the pattern that is actually really great is one that the Team Topologies guys have discovered.If you’ve been following what Matthew Skelton and Manuel Pais have been doing for a while, they’ve basically been documenting a pattern in software organizations of a small number of team types, of a platform team, value stream teams, complicated subtest system teams, and enabling teams. And so we worked with Manuel and Matt on this year’s report and asked a whole bunch of questions to try and validate the Team Topologies model, and the results came back and they were just incredibly strong. Because I think this speaks to some of the stuff you mentioned before that no one can afford to hire an army of Kubernetes developers, and whatever the hottest technology is in five years, most big companies can’t hire an army of those people either. And so the way you get scale internally before those things become commoditized is you build a small team and create the situation where they can have outsized leverage inside their organization, like get rid of all the blockers to fast flow and make their focus self-service to other people. Because if you’re making all of your developers learn distributed systems operations arcane knowledge, that’s not a good use of their time, either.Corey: It’s really not. And I think that’s something that gets lost a lot is, I’ve never yet seen a company beyond the very early startup stage, where the AWS bill exceeded the cost of the people working on the AWS bill. Payroll is always a larger expense than infrastructure unless you’re doing something incredibly strange. And, oh, I want to save some money on the cloud bill is very often offset by the sheer amount of time that you’re going to have to pay people to work on that because, contrary to what we believe as engineering hobbyists, people’s time is very far from free. And it’s also the opportunity cost of if you’re going to work on this thing instead of something else, well, is that really the best choice? It comes down to contextualizing what technology is doing as well as with what’s happening over in the world of business strategy. And without having a bridge between those, it doesn’t seem to work very well.Nigel: Absolutely. It’s insane. It’s literally insane that, as an industry, we will optimize 5%, 3% of our infrastructure bill or application workload and yet not actually reexamine business processes that are causing your people to spend 10% of their time in synchronous meetings. You can save so much more money and achieve so much more by actually optimizing for fast flow, and getting out of the way of the people who cost lots of money.Corey: So, one last topic that I want to cover before we call it an episode. You talk to an awful lot of folks, and it’s easy to point at the aspirational stories of folks doing things the right way. But let’s dish for a minute. What are you seeing in terms of people not using the cloud properly? I feel like you might have a story or two on that one.Nigel: I do have a few stories. So, in this year’s report, one of the things we wanted to find out of, like, are people using the cloud in the way we think of cloud; you know, elastic, consumption-based, all of these sorts of things. We use the NIST metrics, which I recognize can be a little controversial, but I think you’ve got to start somewhere as a certain foundation. It turns out just about everyone is using the public cloud. And when I say cloud, I’m not really talking about people’s internal VMware that they rebadged as cloud; I’m talking about the public cloud providers.Everyone’s using it, but almost no one is taking advantage of the functionality of the cloud. They’re instead treating it like an on-premise VMware installation from the mid-2000s, they’re taking six weeks to provision instances, they’re importing all of their existing processes, they keep these things running for a long time if they fall over, one person is tasked with, “Hey, do you know how pet number 45 is actually doing here?” They’re not really treating any of these things in the way that they’re actually meant to. And I think we forget about this a lot of the time when we talk about cloud because we jump straight to cloud-native, you know, the sort of bleeding edge of folks in serverless, highly orchestrated containers. I think if you look at the actual numbers, the vast majority of cloud usage, it’s still things like EC2 instances on AWS. And there’s a reason: because it’s a familiar paradigm for people. We’re definitely going to progress past there, but I think it’s easy to leave the people in the middle behind when we’re talking about cloud and how to improve the ecosystem that they all operate in.Corey: Part of the problem, too, is that whenever we look at how folks are misusing cloud, it’s easy to lose sight of context. People don’t generally wake up and decide I’m going to do a terrible job today unless they work in, you know, Facebook’s ethics department or something. Instead, it’s very much a people are shaped by the constraints they’re laboring under from a bunch of different angles, and they’re trying to do the best with what they have. Very often, the reason that a practice or a policy exists is because, once upon a time, there was a constraint that may or may not still be there, and going forward the way that they have seemed like the best option at the time. I found that the default assumption that people are generally smart and doing the right thing with the information they have carries you a lot further, in many respects than what I did is a terrible junior consultant, which is, “Oh, what moron built this?” Invariably to said moron, and then the rest of the engagement rapidly goes downhill from there. Try and assume good faith, and if you see something that makes no sense, ask, “Why is it like this?” Rather than, “Why is it like this?” Tone counts for a lot.Nigel: It’s the fundamental attribution bias. It’s why we think all other drivers on the road are terrible, but we actually had a good reason for swerving into that lane.Corey: “This isn’t how I would have built it. So, it’s awful.”Nigel: Yeah, exactly.Corey: Yeah. And in some cases, though, there are choices that are objectively bad, but I tried to understand where they came from there. Company policy, historically, around things like data centers, trying to map one-to-one to cloud often miss some nuances. But hey, there’s a reason it’s called the digital transformation, not a project that we did.Nigel: [laugh]. And I think you’ve got to always have empathy for the people on the ground. I quite often have talked to folks who’ve got, like, a terrible cloud architecture with the deployment and I’m like, “Well, what happened here?” And they went, “Well, we were prepared to deploy this whole thing on AWS, but then Microsoft’s salespeople got to the CTO and we got told at the last minute we’re redeploying everything on Azure.” And so these people were often—you know, you’re given a week or two to pivot around the decision that doesn’t necessarily make any sense to them.And there may have been a perfectly good reason for the CTO to do this: they got given really good kickbacks in terms of bonuses for, like, how much they were spending on the infrastructure—I mean, discounts—but people on the ground are generally doing the best with what they can do. If they end up building crap, it’s because our system, society, capitalism, everything else is at fault.Corey: [laugh]. I have to say, I’m really looking forward to seeing the observations that you wound up putting into this report as soon as it drops. I’m hoping that I get a chance to speak with you again about the findings, and then I can belligerently tell you to justify yourself. Those are my favorite follow-ups.Nigel: [unintelligible 00:37:05].Corey: If people want to get a copy of the report for themselves or learn more about you, where can they find you?Nigel: Just head straight to puppet.com, and it will be on the banner on the front of the site.Corey: Excellent. And will, of course, put a link to that in the show notes, if people can’t remember puppet.com. Thank you so much for taking the time to speak with me. I really appreciate it.Nigel: Awesome. No worries. It was good to catch up.Corey: Nigel Kersten, field CTO at Puppet. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice as well as an insulting comment telling me that ‘comcastic’ isn’t a funny word, and tell me where you work, though we already know.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
This week Corey is joined by Anurag Gupta, founder and CEO of Shoreline.io. Anurag guides us through the large variety of services he helped launch to include RDS, Aurora, EMR, Redshift and other. The result? Running things almost like a start-up—but with some distinct differences. Eventually Anurag ended up back in the testy waters of start-ups. He and Corey discuss the nature of that transition to get back to solving holistic problems, tapping into conveying those stories, and what Anurag was able to bring to his team at Shoreline.io where automation is king. Anurag goes into the details of what Shoreline is and what they do. Stay tuned for me.Links: Shoreline.io: https://shoreline.io LinkedIn: https://www.linkedin.com/in/awgupta/ Email: anurag@Shoreline.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: If your familiar with Cloud Custodian, you’ll love Stacklet. Which is made by the same people who made Cloud Custodian, but put something useful on top of it so you don’t have to be a need to be a YAML expert to work with it. They’re hosting a webinar called “Governance as Code: The Guardrails for Cloud at Scale” because its a new paradigm that enables organizations to use code to manage and automate various aspects of governance. If you’re interested in exploring this you should absolutely make it a point to sign up, because they’re going to have people who know what they’re talking about—just kidding they’re going to have me talking about this. Its doing to be on Thursday, July 22nd at 1pm Eastern. To sign up visit snark.cloud/stackletwebinar and I’ll talk to you on Thursday, July 22nd.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. This promoted episode is brought to you by Shoreline, and I’m certain that we’re going to get there, but first, I’m notorious for telling the story about how Route 53 is in fact a database, and anyone who disagrees with me is wrong. Now, AWS today is extraordinarily tight-lipped about whether that’s accurate or not, so the next best thing, of course, is to talk to the person who used to run all of AWS’s database offerings and start off there and get it from the source. Today, of course, he is not at an Amazon, which means he’s allowed to speak with me. My guest is Anurag Gupta, the founder and CEO of Shoreline.io. Anurag, thank you for joining me.Anurag: Thanks for having me on the show, Corey. It’s great to be on, and I followed you for a long time. I think of you as AWS marketing, frankly.Corey: The running gag has been that I am the de facto head of AWS marketing as a part-time gag because I wandered past and saw an empty seat and sat down and then got stuck with the role. I mostly kid, but there does seem to be, at times, a bit of a challenge as far as expressing stories and telling those stories in useful ways. And some mistakes just sort of persist stubbornly forever. One of them is in the list of services, Route 53 shows up as ‘networking and content delivery,’ which I think regardless of the answer, it doesn’t really fit there. I maintain it’s a database, but did you have oversight into that along with Glue, Athena, all the RDS options, managed blockchain—for some reason—as well. Was it considered a database internally, or was that not really how they viewed it?Anurag: It’s not really how they view it. I mean, certainly there’s a long IP table, right, and routing tables, but I think we characterized it in a whole different org. So, I had responsibility for Analytics, Redshift, Glue, EMR, et cetera, and transactional databases: Aurora, RDS, stuff like that.Corey: Very often when you have someone who was working at a very large company—and yes, Amazon has a bunch of small teams internally, but let’s face it, they’re creeping up on $2 trillion in valuation at the time of this recording—it’s fairly common to see that startups are, “Oh, this person was at Amazon for ages.” As if it’s some sort of amazing selling point because a company with, what is it, 1.2 million people give or take is absolutely like a relatively small just-founded startup culturally, in terms of resources, all the rest. Conversely, when you’re working at scales like that, where the edge case becomes the common case, and the corner case becomes something that happens 18 times an hour, it informs the way you think about things radically differently. And your reputation does precede you, so I’m going to opt for assuming that this is, rather than being the story about, “Oh, we’re just going to try and turn this company into the second coming of Amazon,” that there’s something that you saw while you were at AWS that you thought it was an unmet need in the ecosystem, and that’s what Shoreline is setting out to build. Is that slightly accurate? Or no you’re just basic—there’s a figurehead because the Amazon name is great for getting investors.Anurag: No, that’s very astute. So, when I joined AWS, they gave me eight people and they asked me to go disrupt data warehousing and transaction processing. So, those turned into Redshift and Aurora, respectively, and gradually I added on more services. But in that sense, Amazon does operate like a startup. They really believe in restricting the number of resources you get so that you have time and you’re forced to think and be creative.That said, you don’t really wake up at night sweating about whether you’re going to hit payroll. This is, sort of, my fourth startup at this point and there are sleepless nights at a startup and it’s different. I’d go launch a service at AWS and there’ll be 1000 people who are signed up to the beta the next day, and that’s not the way startups work. But there are advantages as well.Corey: I can definitely empathize with that. My last job before I started this place was at a small scrappy startup which was great for three months and then BlackRock bought us, and then, oh, large regulated finance company combined with my personality ended about the way you think it would. And where, so instead of having the fears and the challenges that I dealt with then, I’m going to go start my own company and have different challenges. And yeah, they are definitely different. I never laid awake at night worrying about how I was going to make payroll, for example.There’s also the freedom, in some ways, at large companies where whatever function needs to get done, whatever problem you have, there is some department somewhere that handles that almost exclusively, whereas in scrappy startup land, it’s, well, whatever problem needs to get done today, that is your job right now. And your job description can easily fill six pages by the end of month two. It’s a question of trade-offs and the rest. What did you see that gave you the idea to go for startup number four?Anurag: So, when I joined AWS thinking I was going to build a bunch of database engines—and I’ve done that before—what I learned is that building services is different than building products. And in particular, nobody cares about your performance or features if your service isn’t up. Inside AWS, we used to talk about utility computing, you know, metering and providing compute storage database the way, you know, my local utility provider, PG&E, provides power and gas. And if I call up PG&E and say that the power is out at my house, I don’t really want to hear, “Oh, did you know that we have six nines power availability in the state of California?” I mean, the power is still out; go come over here and fix it. And I don’t really care about fancy new features they’re doing back at the plant. Really, all I care about is cost and availability.Corey: The idea of utility computing got into that direction, too, in a lot of ways, in some strange nuances, too. The idea that when I flip the light switch, I don’t stop and wonder, is the light going to turn on? You know, until I installed IoT switches and then everything’s a gamble in the wild times again. And if the light doesn’t come on, I assume that the fuse is out, or the light bulb is blown. “Did PG&E wind up dropping service to my neighborhood?” Is sort of the last question that I have done that list. It took a while for cloud to get there, but at this point, if I can’t access something in AWS, my default assumption is that is my local internet, not the cloud provider. That was hard-won.Anurag: That’s right. And so I think a lot of other SaaS companies—or anybody operating in the cloud—are now working and struggling to get that same degree of availability and confidence to supply to their customers. And so that’s really the reason for Shoreline.Corey: There’s been a lot of discussion around the idea of availability and what that means for a business outcome where, I still tell the story from time to time that back in 2012 or so, I was going to buy a pair of underpants on amazon.com, where I buy everything, and instead of completing the purchase, it threw one of the great pictures of staff dogs up. Now, if you listen to a lot of reports on availability, then for one day out of the week, I would just not wear underwear. In practice, I waited an hour, tried it again, the purchase went through and it was fine. However, if that happened every third time I tried to make a purchase, I would spend a lot more money at Target.There has to be a baseline level of availability. That doesn’t mean that your site is never down, period, because that is, in many cases, an unrealistic aspiration and it turns every outage that winds up coming up down the road into an all-hands-on-deck five-alarm fire, which may not be warranted. But you do need to have a certain level of availability that meets or exceeds your customer’s expectations of same. At least that’s the way that I’ve always viewed it.Anurag: I think that’s exactly right. I also think it’s important to look at it from a customer perspective, not a fleet perspective. So, a lot of people do inward-facing SRE measurements of fleet-wide availability. Now, your customer really cares about the region they’re in, or perhaps even the particular host they’re on. And that’s even more true if they’ve got data. So, for example, an individual database failing, it’ll take a long time for it to come back up elsewhere. That’s different than something more ephemeral, like an instance, which you can move more easily.Corey: Part of the challenge that I’ve noticed as well when dealing with large cloud providers, a recurring joke has been the AWS status page: it is the purest possible expression of a static site because it never changes. And people get upset when things go down and the status page isn’t updated, but the challenge is when you’re talking about something that is effectively global scale, it stops being a question of is it up or is it down and transitions long before then into how up or how down is it? And things that impact one customer may very well completely miss another. If you’re being an absolutist, it will always be a sea of red, which doesn’t tell people anything useful. Whereas if a customer is down and their site is off, they don’t really care that most other customers aren’t affected.I mean, on some level, you kind of want everyone to be down because that differs headline risk, as well as if my site is having a problem, it could be days before someone gets around to fixing a small bug, whereas if everything is down, oh, this will be getting attention very rapidly.Anurag: That’s exactly right. Sounds like you’ve done ops before.Corey: Oh, yes. You can tell that because I’m cynical and bitter about everything.Anurag: [laugh].Corey: It doesn’t take long working in operationally-focused roles to get there. I appreciate your saying that though. Usually, people say, “Let me guess. You used to be an ops person.” “How can you tell?” “Because your code is garbage,” is the other way that people go down that path.And yeah, credit where due; they’re not wrong. You mentioned that back when you were in Amazon, you were given a team of eight people and told to disrupt the data warehouse. Yeah, I’ve disrupted the data warehouse as a single person before so it doesn’t seem that hard. But I’m guessing you mean something beyond causing an outage. It’s more about disrupting the space, presumably.Anurag: [crosstalk 00:10:57].Corey: And I think, looking back from 2021, it’s hard to argue that Amazon hasn’t disrupted the data warehouse space and fifteen other spaces besides.Anurag: Yeah, so that’s what we were all about, sort of trying to find areas of non-consumption. So clearly, data was growing; data warehousing was not growing at the same rate. We figured that had to do with either a cost problem, or it had to do with a simplicity problem, or something else. Why aren’t people analyzing the data that they’re collecting? So, that led to Redshift. A similar problem in transaction processing led to Aurora and various other things.Corey: You also said a couple of minutes ago that Amazon tends to talk more about features than they do about products, and building a product at a startup is a foundationally different experience. I think you’re absolutely on to something there. Historically, Amazon has folks get on stage at re:Invent and talk about this new thing that got released, and it feels an awful lot like a company saying, “Yeah, here’s some great bricks you can use to build a house.” “Well, okay. What kind of house can I build with those bricks?” “Here to talk about the house that they built as our guest customer speaker from Netflix.”And it seems like they sort of abdicated, in many respects, the storytelling portion to a number of their customers. It is a very rare startup that has the luxury of being able to just punt on building a product and its product story that goes along with it. Have you found that your time at Amazon made storytelling something that you wound up missing a bit more, or retelling stories internally that we just don’t get to see from the outside, or is, “Oh, wow. I never learned to tell a story before because at Amazon, no one does that, and I have to learn how to do that now that I’m at a startup again?”Anurag: No, I think it really is a storytelling experience. I mean, it’s a narrative-based culture there, which is, in many ways, a storytelling experience. So, we were trying to provide a set of capabilities so that people could build their own things, you know, much as Kindle allows people to self-publish books; we’re not really writing books of our own. And so I think that was the experience there. Outside, you are trying to solve more holistic problems, but you’re still only a puzzle piece in the experience that any given customer has, right? You don’t satisfy all of their needs, you know, soup to nuts.Corey: And part of the challenge too, is that if I’m a small, scrappy startup, trying to get something out the door for the first time, the problems that I’m experiencing and the challenges that I have are radically different than something that has attained hyperscale and now has whole optimization stories or series of stories going on. It’s, will this thing even work at all is my initial focus. And in some ways, it feels like conference-ware cuts against a lot of that because it’s hard not to look at the aspirational version of events that people tell on stage at every event I’ve ever seen, and not come away with a takeaway of, “Oh. What I’ve built is actually terrible, and depressing, and sad.” One of the things that I find that resonates about what you’re building over at Shoreline is, it’s not just about the build things from scratch and get them provisioned for the first time. It’s about the ongoing operationalization, I think—if that’s a word—about that experience, and how to wind up handling the care and feeding of something that exists and is running, but is also subject to change because all things are continually being iterated on.Anurag: That’s right. I feel like operation is sort of an increasingly important but underappreciated part of the service delivery experience much as, maybe, QA was a couple of decades ago. And over time we’ve gone and we built pipelines to automate our test infrastructure, we have deployment tools to deploy it, to configure it, but what’s weird is that there are two parts of the puzzle that are still highly manual: developing software and operating that software in production. And the other thing that’s interesting about that is that you can decide when you are working on developing a piece of code, or testing it, or deploying it, or configuring it. You don’t get to decide when the disk goes down or something breaks. That’s why you have 24/7 on-call.And so the whole point of Shoreline is to break that into two problems: the things that are automatable, and make it easy, as trivial to automate those things away so you don’t wake up to do something for the tenth time; and then for the remaining things that are novel, to make diagnosing and repairing your fleet, as simple and straightforward as diagnosing and repairing a single box. And we do a lot of distributed systems [techs 00:16:01] underneath the covers to make that the case. But those are the two things that we do, and so hopefully that reduces people’s downtime and it also brings back a lot of time for the operators so they can focus on higher-value things, like working with you to reduce their AWS bill.Corey: Yeah, for better or worse, working on the AWS bill is always sort of a backseat function, or a backburner function, it’s never the burning priority unless things have gone seriously awry. It’s a good governance thing; it’s the idea of where, let’s optimize this fixed unit economics. It is rarely the number one most pressing area of business for a company. Nor should it be; I think people are sometimes surprised to hear me say that. You want to be reasonable stewards of the money entrusted to you and you obviously want to continue to remain in business by not losing money on everything you sell, but trying to make it up in volume. But at some point, it’s time to stop cutting and focus instead on revenue growth. That is usually the path to success for almost every company I’ve ever spoken to, unless they are either very out of kilter, or in a very strange spot in the industry.Anurag: That’s true, but it does belong, I think, in the ops function to do optimization of your experience, whether—and, you know, improving your resources, improving your security posture, all of those sorts of things fall into production ops landscape, from my perspective. But people just don’t have time for it because their fleets are growing far, far faster than their headcount is. So, the only solution to that is automation.Corey: And I want to talk to you about that. Historically, the idea has been that you have monitoring—or observability these days, which I consider to be hipster monitoring—figuring out what’s going on in your environment. Then you wind up with incidents being declared when certain things wind up triggering, which presumably are things that actually matter and not, you’re waking someone up for vague reasons like ‘load average is high on these nodes,’ which tells you nothing in isolation whatsoever. So, you have the incident management portion of that [next 00:18:03], and that handles a lot of the waking folks up and getting everyone onto the call. You’re focusing on, I guess, a third tranche here, which is the idea of incident automation. Tell me about that.Anurag: That’s exactly right. So, having been in the trenches, I never got excited about one more dashboard to look at, or someone routing a ticket to the right person, per se, because it’ll get there, right?Corey: Oh, yeah. Like, one of the most depressing things you’ll ever see in a company is the utilization numbers from the analytics on the dashboards you build for people. They look at them the day you build them and hand it off, and then the next person visiting it is you while running this report to make sure the dashboard is still there.Anurag: Yeah. I mean, they are important things. I mean, you get this huge sinking feeling something is wrong and your observability tool is also down like CloudWatch was in some large-scale events. Or if your ticketing system is down and you don’t even notify somebody and you don’t even know to wake up. But what did excite me—so you need those things; they’re necessary, but they’re not sufficient.What I think is also needed is something that actually reduces the number of tickets, not just lets you observe them or find the right person to act upon it. So, automation is the path to reducing tickets, which is when I got excited because that was one less thing to wake up on that gave me more time back to wo—do things, and most importantly, it improved my customer availability because any individual issue handled manually is going to take an hour or two or three to deal with. The issue being done by a computer is going to take a few seconds or a few minutes. It’s a whole different thing. It’s the difference between a glitch and having to go out on an apology tour to your customers.Corey: I really love installing, upgrading, and fixing security agents in my cloud estate! Why do I say that? Because I sell things, because I sell things for a company that deploys an agent, there's no other reason. Because let’s face it. Agents can be a real headache. Well, now Orca Security gives you a single tool that detects basically every risk in your cloud environment -- and that’s as easy to install and maintain as a smartphone app. It is agentless, or my intro would’ve gotten me into trouble here, but  it can still see deep into your AWS workloads, while guaranteeing 100% coverage. With Orca Security, there are no overlooked assets, no DevOps headaches, and believe me you will hear from those people if you cause them headaches. and no performance hits on live environments. Connect your first cloud account in minutes and see for yourself at orca.security. Thats “Orca” as in whale, “dot” security as in that things you company claims to care about but doesn’t until right after it really should have.Corey: Oh, yes. I feel like those of us who have been in the ops world for long enough, we always have a horror story or to have automation around incidents run amok. A classic thing that we learned by doing this, for example, is if you have a primary and a secondary, failover should be automated. Failing back should not be, or you wind up in these wonderful states of things thrashing back and forth. And in many cases in data center land, if you have a phantom router ready to step in, if the primary router goes offline, more outages are caused by a heartbeat failure between those two devices, and they both start vying for power.And that becomes a problem. Same story with a lot of automation approaches. For example, if oh, every time a disc winds up getting full, all right, we’re going to fire off something automatically expand the volume. Well, without something to stop that feedback loop, you’re going to potentially wind up with an unbounded growth problem and then you wind up with having no more discs to expand the volume to, being the way that winds up smacking into things. This is clearly something you’ve thought about, given that you have built a company out of this, and this is not your first rodeo by a long stretch. How do you think about those things?Anurag: So, I think you’re exactly right there, again. So, the key here is to have the operator, or the SRE, define what needs to happen on an individual box, but then provide guardrails around them so that you can decide, oh, a lot of these things have happened at the same time; I’m going to put a rate limiter or a circuit breaker on it and then send it off to somebody else to look at manually. As you said, like failover, but don’t flap back and forth, or limit the number of times, but something is allowed to fail before you send it [unintelligible 00:21:44]. Finally, everything grounds that a human being looking at something, but that’s not a reason not to do the simple stuff automatically because wasting human intelligence and time on doing just manual stuff again, and again, and again, is pointless, and also increases the likelihood that they’re going to cause errors because they’re doing something mundane rather than something that requires their intelligence. And so that also is worse than handing it off to be automated.But there are a lot of guardrails that can be put around this—that we put around it—that is the distributed systems part of it that we provide. In some sense, we’re an orchestration system for automation, production ops, the same way that other people provide an orchestration system for deployments, and automated rollback, and so forth.Corey: What technical stacks do you wind up supporting for stuff like this? Is it anything you can effectively SSH into? Does it integrate better with certain cloud providers than others? Is it only for cloud and not for folks with data center environments? Where do you start? Where do you stop?Anurag: So, we have started with AWS, and with VMs and Kubernetes on AWS. We’re going to expand to the other major cloud providers later this year and likely go to VMware on-prem next year. But finally, customers tell us what to do.Corey: Oh, yeah. Looking for things that have no customer usage is—that’s great and all, but talking to folks who are like, “Yeah, it’d be nice if it had this.” “Will you buy it if it does?” “No.” “Yeah, let’s maybe put that one on the backlog.”Anurag: And you’ve done startups, too, I see that.Corey: Oh, once or twice. Talk to customers; I find that’s one of those things that absolutely is the most effective use of your time you can do. Looking at your site—Shoreline.io for those who want to follow along at home—it lists a few different remediations that you give as examples. And one of them is expanding disk volumes as they tend to run out of space. I’m assuming from that perspective alone, that you are almost certainly running some form of Agent.Anurag: We are running an Agent. So, part of that is because that way, we don’t need credentials so that you can just run inside the customer environment directly and without your having to pass credentials to some third party. Part of it is also so you can do things quickly. So, every second, we’ll scrape thousands of metrics from the Prometheus exporter ecosystem, calculate thousands more, compare them against hundreds of alarms, and then take action when necessary. And so if you run on-box, that can be done far faster than if you go on off-box.And also, a lot of the problems that happen in the production environment are related to networking, and it’s not like the box isn’t accessible, but it may be that the monitoring path is not accessible. So, you really want to make sure that the box can protect itself even if there’s some issues somewhere in the fleet. And that really becomes an important thing because that’s the only time that you need incident automation: when something’s gone wrong.Corey: I assume that Agent then has specific commands or tasks it’s able to do, or does it accept arbitrary command execution?Anurag: Arbitrary command execution. Whatever you can type in at the Linux command prompt, whether it’s a call to the AWS CLI, Kube control, Linux commands like top, or even shell scripts, you can automate using Shoreline.Corey: Yeah. That was one of the ways that Nagios got it wrong, once upon a time, with their NRP, their Nagios Remote Plugin engine, where you would only be allowed to run explicit things that had been pre-approved and pushed out to things in advance. And it’s one of the reasons, I suspect, why remediation in those days never took off. Now, we’ve learned a lot about observability and monitoring, and keeping an eye on things that have grown well beyond host-based stuff, so it’s nice to see that there is growth in that. I’m much more optimistic about it this time around, based upon what you’re saying.Anurag: I hope you’re right because I think the key thing also is that I think a lot of these tools vendors think of themselves as the center of the universe, whereas I think Shoreline works the best if it’s entirely invisible. That’s what you want from a feedback control system, from a automation system: that it just give you time back and issues are just getting fixed behind the scenes. That’s actually what a lot of AWS is doing behind the scenes. You’re not seeing something whenever some rack goes down.Corey: The thing that is always taken me back—and I don’t know how many times I’m going to have to learn this lesson before it sticks—I fall into the common trap of take any one of the big internationally renowned tech companies, and it’s easy to believe that oh, everything inside is far future wizardry of, everything works super well, the automation is flawless, everything is pristine, and your environment compared to that is relative garbage. It turns out that every company I’ve ever spoken with and taken SREs from those companies out to have way too many drinks until they hit honesty levels, they always talk about it being a sad dumpster fire in a bunch of different ways. And we’re talking some of the companies that people laud as the aspirational, your infrastructure should be like these companies. And I find it really important to continue to socialize that point, just because the failure mode otherwise is people think that their company just employs terrible engineers and if people were any good, it would be seamless, just like they say on conference stages. It’s like comparing your dating life to a romantic comedy; it’s not an accurate depiction of how the world works.Anurag: Yeah, that’s true. That said, I’d say that, like, the average DBA working on-prem may be managing a hundred databases; the average DBA in RDS—or somebody on call—might be managing a hundred thousand.Corey: At that point, automation is no longer optional.Anurag: Yeah. And the way you get there is, every week you squash and extinguish one thing forever, and then you start seeing less and less frequent things because one in a million is actually occurring to you. But if it was one in a hundred, that would just crush you. And so you just need to, you know, very diligently every week, every day, remove something. Yeah, Shoreline is in many ways the product I wish I had had at AWS because it makes automating that stuff easy, a matter of minutes, rather than months. And so that gives you the capability to do automation. Everyone wants automation, but the question is, why don’t they do it? And it’s just because it takes so much time and we’re so busy, as operators.Corey: Absolutely. I don’t mean to say that these large companies working at hyperscale have not solved for these problems and done truly impressive things, but there’s always sharp edges, there’s always things that are challenging and tricky. On this show, we had Dr. Christina Maslach recently as an expert on burnout, given that she spent her entire career studying occupational burnout as an academic. And it turns out that it’s not—to equate this to the operations world—it’s not waking up at two in the morning to have to fix a problem—generally—that burns people out. It’s being woken up to fix a problem at 2 a.m. consistently, and it’s always the same problem and nothing ever seems to change. It’s the worst ops jobs I’ve ever seen are the ones where you have to wake up to fix a thing, but you’re not empowered to actually fix the cause, just the symptom.Anurag: I couldn’t agree more and that’s the other aspect of Shoreline is to allow the operators or SREs to build the remediations rather than just put a ticket into some queue for some developer to get prioritized alongside everything else. Because you’re on the sharp edge when you’re doing ops, right, to deal with all the consequences of the issues that are raised. And so it’s fine that you say, “Okay, there’s this memory leak. I’ll create a ticket back to dev to go and fix it.” But I need something that helps me actually fix it here and now. Or if there’s a log that’s filling up my disk, it’s fine to tell somebody about it, but you have to grow your disk or move that log off the disk. And you don’t want to have to wake up for those things.Corey: No. And the idea that everything like this gets fixed is a bit of a misnomer. One of my hobbies is whenever a site goes down and it is uncovered—sometimes very publicly, sometimes in RCEs—that the actual reason everything broke was due to an expired certificate.Anurag: Yep.Corey: I like to go and schedule out a couple of calendar reminders on that one for myself, of check it in 90 days, in case they’re using a refresh from Let’s Encrypt, and let’s check it as well in one year and see if there’s another outage just like that. It has a non-zero success rate because as much as we want to convince ourselves that, oh, that bit me once, and I’ll never get bitten like that again, that doesn’t always hold true.Anurag: Certificates are a very common source of very widespread outages. And it’s actually one of the remediations we provide out of the box. So, alongside making it possible for people to create these things quickly, we also provide what we call Op Packs, which are basically getting started things which have the metrics, alarms, actions, bots, so they can just fix it forever without actually having to do very much other than review what we have done.Corey: And that’s, on some level, I think, part of the magic is abstracting away the toil so that people are left to solve interesting problems and think about these things, and guiding them down a path where, okay, what should I do on an automatic basis if the disk fills up? Well, I should extend the volume. Yeah. But maybe you should alert after the fifth time in an hour that you have to extend the same volume because—just spitballing here—maybe there’s a different problem here that putting a bandaid on isn’t going to necessarily solve. It forces people to think about what are those triggers that should absolutely result in human intervention because you don’t necessarily want to solve things like memory leaks, for example, oh our application leaks memory so we have to restart it once a day.Now, in practice, the right way to solve that is to fix the application. In practice, there are so many cron jobs out there that are set to restart things specifically for that reason because cron jobs are quick and easy and application developer time is absolutely not easy to come by in many of these shops. It just comes down to something that helps enforce more of a process, more of a rigor. I like the idea quite a bit; it aligns both with where people are and how a better tomorrow starts to look. I really do think you’re onto something here.Anurag: I mean, I think it’s one of these things where you just have to understand it’s not either-or, that it’s not a question of operator pain or developer pain. It’s, let’s go and address it in the here and now and also provide the information, also through an automated ticket generation, to where someone can look to fix it forever, at source.Corey: Oh, yeah. It’s always great of the user experience, too. Having those tickets created automatically is also sometimes handy because the worst way to tell someone you don’t care about their problem when they come to you in a panic is, “Have you opened a ticket?” And yes, of course, you need a ticket to track these things, but maybe when someone is ghost pale and scared to death about what they think just broke the data, maybe have a little more empathy there. And yeah, the process is important, but there should be automatic ways to do that. These things all have APIs. I really like your vision of operational maturity and managing remediation, in many cases, on an automatic basis.Anurag: I think it’s going to be so much more important in a world where deployments are more frequent. You have microservices, you have multiple clouds, you have containers that give a 10x increase in the number of things you have to manage. There’s a lot for operators to have to keep in their heads. And things are just changing constantly with containers. Every minute, someone comes and one goes. So, you just really need to—even if you’re just doing it for diagnosis, it needs to be collecting it and putting it aside, is really critical.Corey: If people want to learn more about what you’re building and how you think about these things, where can they find you?Anurag: They can reach out to me on LinkedIn at awgupta, or of course, they can go to Shoreline.io and reach out there, where I’m also anurag@Shoreline.io if they want to reach out directly. And we’d love to get people demos; we know there’s a lot of pain out there. Our mission is to reduce it.Corey: Thank you so much for taking the time to speak with me today. I really appreciate it.Anurag: Yeah. This was a great privilege to talk to you.Corey: Anurag Gupta, CEO and founder of Shoreline.io. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment telling me that I’m wrong and that Amazonians are the best at being on call because they carry six pagers.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About MichaelMichael Garski is the Director of Platform Engineering at Fender Musical Instruments, where he leads the teams responsible for service development & testing, devops, and data. He’s been with Fender for over 5 years and prior to that  worked as a software engineer & architect on back-end systems at Viant, MySpace, Countrywide Home Loans & Fandango. He is passionate about application reliability and observability and their impact on customer satisfaction.Links:LinkedIn: https://www.linkedin.com/in/mgarski/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: If your familiar with Cloud Custodian, you’ll love Stacklet. Which is made by the same people who made Cloud Custodian, but put something useful on top of it so you don’t have to be a need to be a YAML expert to work with it. They’re hosting a webinar called “Governance as Code: The Guardrails for Cloud at Scale” because its a new paradigm that enables organizations to use code to manage and automate various aspects of governance. If you’re interested in exploring this you should absolutely make it a point to sign up, because they’re going to have people who know what they’re talking about—just kidding they’re going to have me talking about this. Its doing to be on Thursday, July 22nd at 1pm Eastern. To sign up visit snark.cloud/stackletwebinar and I’ll talk to you on Thursday, July 22nd.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. We talk to a lot of people here on this show who are deep in the weeds of SaaS companies, or cloud vendors, or cloud vendors cosplaying as SaaS companies. Today, we’re taking a bit of a different direction. My guest is Michael Garski, Director of Platform Engineering at Fender Musical Instruments. They make guitars among many other things. Michael, thank you for joining me.Michael: Oh, thanks for having me on, Corey.Corey: So, one of the things that I really appreciate about what you do as a company is I can, at least presumably, explain it to someone who is not super deep in technical weeds without 45 minutes of explainer first. The easy answer is, “Oh, Fender. You folks make guitars.” These days, no one just does one thing, I have to imagine. How do you describe what the company does?Michael: Oh, well, to quote Leo Fender, his view was that artists are angels and it’s our job to give them wings. So, in addition to actually making and developing guitars and amplifiers, we’ve branched off into consumer-facing products to actually teach people how to play those instruments.Corey: You folks have been relatively outspoken about the various things you’re doing at different AWS events. I mean, my approach to that tends to be that if AWS is great at making bricks that you can use to build amazing things with, “Well, great, can you draw a picture of the house that you can build with this?” “No, we’re going to have a customer come out and talk about that stuff instead.” You folks have been focusing on a lot of serverless work, and you’ve been very public about the fact that you are almost entirely serverless-driven in terms of architecture if I’m not mistaken.Michael: That is true.Corey: Tell me about that. How did you get there and what brought it about?Michael: So, I work in the digital division in Fender. We started, let’s see, we’re coming up on five years I’ve been there. So, what we did was, initially, we started building services that could run within a container, or on an EC2 instance, but we started looking at Lambda functions. We had need to ingest a product catalog, so the IT team was able to drop us off a product catalog into an S3 bucket, and the easiest thing to do then was just trigger a Lambda function to then process that file. And it just kind of snowballed in from there.Corey: I think the common problem when people hear ‘serverless’ is they think, “Oh, great. More discussions about Lambda functions.” And Lambda is almost getting something of a tarred reputation in some circles because when we can build amazing things with it ourselves, we love it, but when we ask AWS how to wind up integrating two services, or about a feature gap, their response is, “Oh, use a Lambda function for it,” It starts to feel like they’re using it as spackle and the spackle has become load-bearing. Do you view serverless as being purely function-driven or is it broader than that?Michael: It’s much broader than that. Serverless is a mindset where you’re looking beyond just Lambda functions to using a lot of third-party services so that you can actually focus on your core business. Like, we use Zuora as a subscription provider for web-based subscriptions; we use Algolia for full-text search; we use a variety of other services so that we can just focus on the core business.Corey: One thing that’s been on everyone’s mind, somewhat recently, has been the idea of dramatic changes as far as user behavior goes. And in the more traditional environments where you see things like EC2 instances or on-premises data centers, back when the pandemic first hit and companies that were very focused on a model of business that aligned directly with people behaving in certain ways that they suddenly didn’t, would the 80% drop-offs or more in their user traffic, but their infrastructure spend just kept hanging out exactly where it was, in a straight line. So, at some level, it feels like yes, the whole point of cloud is that it can be elastic, except no one builds it that way for a variety of reasons. When COVID hit, what changed for your business?Michael: Change for our business is we launched a program called Playthrough, okay we did this about a year ago; we started it, we gave away three months of Fender Play for free. It was a single-use code that a user would redeem and no credit card required, and over a period of five days, we saw our traffic increase by more than ten times. And we had very little changes we needed to make. Everything scaled up, we had no issue with—we used a lot of Lambda functions, DynamoDB, everything just scaled up fine. The only point that became a bottleneck was our Elasticsearch cluster. However, beefing up the nodes and adding a few more nodes that resolved that issue immediately.Corey: So, I’m going to go out on a limb and postulate that you folks increased pickup when the lockdowns hit, if for no other reason then, “Well, I’m trapped at home and I’m tired of staring at the guitar on the wall. I may as well learn to play it.” I would guess. I could be way off base on that.Michael: No, no, that’s very true. Even since then, even after that program has expired—of course, not everyone then converts and sticks around—but many, many did, many more than we thought would did stick around, and our usage and our goals were exceeded for this last year, and we’re in a healthy place, and looking at continuing to grow and expand in the future.Corey: So, one of the applications that I think gets a fair bit of attention—rightfully so—lately, is something called Fender Play, and as best I can tell, that is a app that works in web, it works on mobile, and it’s a video-based instruction tool for guitar at least, but some other instruments as well. How did that come to be? Did that exist before COVID hit? Has that been something that’s been in the works for a while? Or was it, “Well, we’re going to do a two-week sprint and build this thing from scratch?”Michael: No, we launched that—this June we’re coming up on the fourth anniversary since it’s been launched, so we launched this in summer of 2017.Corey: One of the problems I’ve always found is that it’s challenging to learn to do something that is as, I guess, physical and intricate, et cetera, as playing an instrument without having someone in the room looking at you and smacking you with a stick whenever you do things that are wrong. “Nope, that’s a bad habit. If you keep doing that it’s going to hurt you.” How do you approach that as a company from a non-interactive perspective of someone who’s going to watch a video and do things and maybe it’ll work, maybe it won’t? Particularly in light of things like, well, the competition is YouTube, which, you know, I’m going to roll the dice and sometimes I’ll see a great tutorial, sometimes I’ll see one that I don’t realize teaching me terrible things, and then it’s going to recommend some baseless conspiracy theory because YouTube. How do you differentiate that? What makes Fender Play different?Michael: So currently, you’re right; it’s just a video-based instruction app. There’s not any way to, like, provide direct feedback to students within the web and mobile applications. However, we do have an online community, and our Fender Play instructors do an office hours feature, is where they’ll actually answer questions live and talk to students. We are investigating and doing some earlier research in some, possibly, being able to provide that type of feedback to users, but it’s very challenging problem, just due to the nature of you’re playing an instrument that has multiple strings, so you’re trying to pick out the chord that they’re playing in, and the timing. But it’s something we definitely need to add.Corey: There’s something to be said as well for the kind of care and attention that you folks wind up putting into your media where, “This is how you finger a chord,” and someone on the YouTube video will do it for two-tenths of a second, and they’re filming it with a potato that isn’t focused properly and pointing at the wrong part of the guitar. You folks have a high bar for quality on this. Is that done in-house? Do you wind up just going through a bunch of random folks that you just wind up offering a bunch of gift cards to, or free guitars to do this? How does the program work on the back end?Michael: So, we have an in-house curriculum team that puts together the lesson plans to really help people learn in small bite-sized lessons so that it’s not too overwhelming at once. And that curriculum then is shot and filmed by an in-house video team that put that together; they upload the data into S3 for the final cut, then that gets transcoded via MediaConvert, and we serve it up via CloudFront.Corey: It’s rare to wind up talking to a company that is something of a household name about something that they’re doing, and hear the AWS services that they’re using not trend toward a baseline mean if I can be so bold. Normally, you’ll see some of the case studies, like, “Oh, this is an online bank. What services are they using?” “Oh, they’re using EC2, and S3, and load balancing because did you miss the part where it’s a bank?” They’re not going to use these far-future services due to regulatory risk, among other things, in many cases.You’re using Elemental MediaConvert, which is one of those relatively high-up-the-stack offerings that isn’t broadly known. It’s one of those services that is focused on specific use cases and specific industry verticals in a way that a baseline primitive service isn’t. What does MediaConvert do?Michael: What it does is it takes the final edit of the video, and we have several different presets so that it will put it into an HLS format with different bitrates so that the user is getting the best quality video depending on their bandwidth.Corey: When I looked into it in the early days when it was first launching, I found that it looked an awful lot like Elastic Transcoder, which is a service that they’ve had for a while, only they changed up some of the capabilities. It’s obviously far more capable as a service, but they also added something that felt like 15 different billing dimensions to it, “So, what is this going to cost me?” “Well, we’re going to run it for a month and find out if we’re still in business.” And it seemed like it was one of those very difficult to get started with and run experiments with service. Now, obviously, services evolve over time. When you started looking into it was that experience roughly akin to what you felt, or am I completely and unfairly slandering in the product?Michael: We actually started out using Elastic Transcoder and then moved over to MediaConvert, I believe it was last year. We found it to be a little bit easier to use, and the pricing overall in transcoding the videos for us is really a drop in the bucket as compared to actually hosting them and serving them up via CloudFront. And when we switched over to MediaConvert, we adjusted our settings to lower the maximum bitrate for a given video, we found that after a certain point, the quality to the user just doesn’t really improve, and yet we’re paying to serve the larger video.Corey: One statistic that I found was that in March of 2020—you know which I believe we’re still in at this point; just, it’s the Endless September model, applied to March—you wound up seeing over an order of magnitude in traffic increase within five days, and looking at that through a lens of traditional architecture, that means that nobody sleeps a whole heck of a lot. Given that you’re in on the serverless story, and you have been since before that hit, what was that scaling experience like for you?Michael: Scaling experience was completely seamless. We use a lot of Lambda, DynamoDB, Kinesis, SNS, to glue things together, and no problems whatsoever. Just had to bump up our Elasticsearch cluster a bit, that was really the only thing because we saw some latency starting to rise on some of our APIs.Corey: Let me ask the uncomfortable question then because whenever I tried to scale things up quickly in a cloud environment, what was your experience with smacking into various AWS service limits as the traffic grew?Michael: Initially, we actually requested some service limits increase to make sure we weren’t hitting the concurrent Lambda invocation limit, and same thing with Cognito, making sure that we weren’t going to hit any limits as far as sign-ins and things like that. So, we were able to just put in requests, and they served us around pretty quick turnaround time on that, as well.Corey: It really does seem like there’s a strong benefit on the serverless space, but I had to double-check before we started recording that you do, in fact, work at Fender because you are a staunch advocate for observability. And usually, when someone is that passionate about observability, you can guess that they work at an observability-slash-monitoring company. It’s akin to the idea of someone selling mattresses telling you that mattresses are great and you should have four of them. You’re on the customer side of that and still very passionate about it. Where’d that come from?Michael: Came from my time years ago, when I worked at MySpace—if anyone can still remember that—working on the search systems there. And as the company started winding down, to laying people off, and being one of the only people left working on those systems, being able to know and understand them, you just have to, so you have to continue to monitor and find ways to monitor, and that really ingrained how important instrumentation is and being able to really understand the health of your application as it’s running so that you can see, yes, everything is good, and then when something doesn’t look right so that you can know where to start looking, and you can be alerted of a problem.Corey: So, I tend to view the world in olden terms where monitoring was what we did, and we use something like Nagios, which was the second-worst option out there because everything else felt like it was tied for first. I also take a somewhat regressive view that observability is to monitoring as DevOps is to being a systems administrator. It’s the same thing, but by using the more modern terminology, you can charge more for it. I’m going to go out on a limb and guess that you take a somewhat contrarian [laugh] view to that.Michael: Yes, yes, I do. It’s about really understanding how your applications is running. It’s not just looking at, oh, how many HTTP 500s am I serving up per hour, if I hit a threshold for the last hour? It’s a lot more than that. It’s really being able to really dig in and see what the issue is or what’s working really well.And to that end, we rely on two services for this. We use Honeycomb and Epsagon. Honeycomb, kind of, acts as our top layer because it gives us the really good high-cardinality metrics where I can punch in a user ID and I can see all the API traffic that this user has performed. As well as, even just like when we launched the Playthrough when our traffic rose, that the reason we discovered that our latency was dropping was due to a service-level objective being triggered in Honeycomb on latency. And we were able to respond to that using that before customers really noticed anything at all.Corey: As an Epsagon customer myself, I’m always conflicted when I find myself going into their service and using it to figure out what the heck’s going on with my giant pile of Lambda functions, and API gateways, and whatnot, wired together because the experience is uniformly excellent, but I’m also frustrated in that it needs a third-party to even begin to allude to what’s going on. It feels, on some level, like the vendor that is providing this service to me should be reasonably effective at telling me what it’s doing, and when it’s breaking. I understand that how I wish the world is and how it actually is are two radically different things but does that ever strike you as well?Michael: Whether or not AWS should be providing that type of level, that seems… that seems like more of a service that you can have competition and other vendors that really specialize and get in the weeds on it. I don’t think AWS needs to provide every service you could possibly use for your application. That’s not something I’m too concerned about. I don’t really even think it’s their place, frankly.Corey: No, no, I understand. The problem I keep running into, on some level, whenever I try and diagnose it natively is, I look at CloudWatch and it’s difficult to understand that is this—in my case because again, I’m still early days with a lot of these things—is it the API gateway that’s having the problem? Is it the CloudFront distribution that is tied to that? Is it the Lambda function? Where’s the handoff?Trying to understand where in a complicated application the failure is occurring is a challenge. And let’s be clear, most of that is a problem of my own making because I didn’t have the good sense to instrument this thing in a reliable repeatable way when I built it. It feels like everything is tied together with duct tape, and baling wire, and spit, and a bit of luck. As a counterpoint, the more companies I talk to, the more I realize that no, no, this is actually how most people feel [laugh] when they look at things that are working. It’s, yeah, it’s terrible. It’s a trash fire, but it makes money so we’re going to roll with it.And there’s always, on some level, a sense of what we’ve built is very far from the platonic ideal of what we should have built. Does that resonate with you, or do you take a step back and look at what you’ve achieved with a perspective of, “This is awesome. More people should do it exactly like this.” And honestly, if it’s that one, I’d love to take a look at what you’ve built.Michael: I think there’s always room for us to improve on what we’re doing because we’re constantly learning and evolving to improve both, even at such a low level of like, “Okay, how do we lay out the files in our service repository to make the best organization to make sense?” All the way up to, “Okay, how are we going to do tracing? And what kind of information do we need to get from that so that we can find problems when they occur?” We’re always looking to learn what others are doing, and talking to others in this space. No one will ever be a hundred percent right. There’s always room for improvement everywhere.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: One thing that you folks have done that I think was really interesting and didn’t get as much play as I think it really deserved, was that, especially in the early days of the pandemic, you wound up seeing that massive increase due to giving out almost a million free three-month subscriptions to Playthrough. Additionally, you also worked closely with LAUSD, the Los Angeles Unified School District, to add Fender Play to their middle school music program’s curriculum to help supplement their remote learning programs. First, was that all in the same timeframe? Or—and, two, what has it been like, I guess, working with a organization that is, I guess, on some level, not particularly cloud-first. I would imagine. When I lived in Los Angeles, I never got the sense that LAUSD was full-on serverless, full on-board with cloud, full on-board with remote learning. And then the pandemic of course exacerbates all of that.Michael: Yeah, so those were really two different projects. So, that the Playthrough project that started in March, and we started working with Los Angeles Unified School District last year during their summer school program; started out with 1500 students and we put it together very quickly. Essentially, we use the same three-month codes that we used for that Playthrough promotion so that we could set things up very quickly for students and gave out, through our nonprofit arm of Fender, the Fender Play Foundation, gave out 1500 instruments to these students to use during the summer school program. And that program became so successful, we continued on with them in the fall, and now in the current semester, and we will be again this summer. I believe there’s 7000 students in the program now.And working with their IT team has actually been quite nice. And in dealing with partners, you wouldn’t think much of, “Oh, it’s a school district, what do they have?” But as far as just ease of working with them, we actually hooked into their SAML provider in Cognito so that LAUSD students could authenticate when they come in through the remote learning systems. And they were great to work with and very helpful and cooperative.Corey: One of the arguments that you’ll see that comes up against serverless, from time to time, is that you are now indelibly linked to your provider, but you can’t take what you’ve built with all of these services and just move it over to Azure or GCP on a moment’s whim. Now, in practice, people who tend to build for that, just build everything on top of EC2 and very little else, and then run it entirely in AWS and never move it to any of those other places. But was there friction with making that, I guess, architectural commitment to a single vendor?Michael: Oh, you’re bringing up the vendor lock-in Boogeyman.Corey: Oh, I absolutely am. Most people who bring that—when I bring it up as a straw man so you can attack it, most people who bring up the vendor lock-in Boogeyman, “Oh, you have to go multi-cloud,” are either trying to sell you something that is required if you want to go multi-cloud, or they’re a cloud provider themselves who know that if you go all-in on one provider, it will certainly not be theirs.Michael: I think if you properly architect your applications with separations of concerns that you could move to, say—okay, say Lambda wasn’t working out for us anymore, and we needed to take our applications and, where, we’re going to put them into a container, but we’re going to stay in AWS. Our applications are set up in such a way that Lambda is basically a deployment pattern. We could easily convert those individual function handlers into route handlers with a minimal effort because the business logic and then the underlying data storage are separated. So, it would be feasible for us if we wanted to, say, move to Azure and use Azure Functions and whatever comparable service they have to DynamoDB. I’m not too familiar with a lot of their offerings.But that would certainly be possible to do it with, obviously, some effort and really, at the end of the day, the resources you have working on the applications are end up going to costing you much more than any, sort of like, software licensing or specific savings you’re going to get from a cloud vendor, so might as well go ahead and just use those service that they’re providing. So that you can just focus on the business.Corey: My approach has almost universally been that looking at an awful lot of companies and their AWS bills, it is a challenge to find an environment where the resources in the environment cost more than the people who are operating them. In the context of business, AWS bills seemed giant and enormous, right up until you look at payroll and then it’s, “Oh, okay.” That’s counterintuitive for folks who are learning this, and I fall prey to it myself is, when I’m playing around as a hobbyist trying to build something I value, my time is free because I’m learning as this goes, and then in that context, especially when I was starting out as a student, it was, “Oh, great. So, this winds up costing me $7 a month. Oh, that’s a lot of money. That’s my ramen budget, so I’m instead going to wind up spending eight hours avoiding it charging me anything.” It’s the exact opposite from the direction you want staff that you’re paying to work on these things to go in. How do you approach the idea of increasing the cloud cost if it will save time for your team?Michael: It’s a balance between, where do we need to build this ourselves? And then not only build it, you have to operate it and maintain it? Or what is the cost of getting this third-party service? And that’s really what it comes down to in all of them. And do we actually want to spend time working on this piece of infrastructure that these other people are specializing in and do so well? I’ve got better things I can have people doing than that.Corey: Speaking of people, one thing that you talk about, as you self-describe, is that you wind up not writing a whole lot of code anymore, but you’re something of a stickler for observability and enforcing consistency between services, so you’ll periodically do things like submit a PR to tweak a log message to put your mind at ease, was one example that you gave. Given that you’re a director, which is generally manager of managers style approaches, how do you avoid having those PRs come across to your team as either micromanagement or a condemnation of what they’ve built? Because I get it; when I see something that’s easy and small to tweak, I want to go ahead and get it fixed immediately. I don’t want to go back and forth and play those games; I just want it done. But I’m also always weighing that against, I don’t want to have people think that I’m judging them somehow for something I’m very much not.Michael: That’s a very good point. The larger technical decisions on how things are laid out, I generally just try to—I don’t insert myself into. I let the team go ahead, and make those decisions, and leave that direction, and let them take the charge on that, and I take the approach of looking at it as more of a guiding, and mentoring and teaching to really hone and instill that discipline in really being able to understand what the applications are doing. And as our team is growing, I have less and less time to even do those things, but I can go through the systems and go, “Hey, how come we’re not tracing this call to the reCAPTCHA servers? Let’s add that in there.” And I’ll just at this point now, I mainly just write Jira tickets to have someone else actually do the work.Corey: The more I do this, the more I realize that as complicated as the technology is, the people are in many ways, far more complicated. And let’s be fair here, non-deterministic things that work super well on one person one month could work entirely differently a following month, or even with the same person, or between teams. It’s a constant balancing act, on some level. And giving people a sense of psychological safety has always been the biggest challenge. The thing that surprised me about management, back when I was running ops teams was the more, I guess, responsibility you accrue as you rise from individual contributor into the management—or ‘rise’ is sort of a wrong term; it’s an orthogonal transition—is that you spend a lot more time on the people problems, and your ability to directly control or affect change diminishes because you have to do everything via influence. You get a lot more responsibility with a lot less direct power [laugh] over the outcome in some ways. Does that align with how you see it, or am I just—do I have very strange approaches on management? Which may be true, and why I got out of it as fast as I could.Michael: No, that is a good point because you are having to [unintelligible 00:27:05], like, influence, and guide, and more take a higher-level view, as opposed to really getting into the weeds of like, “Okay, what methods are we going to put on this interface? How are we going to, say, architect the internals of an application?” Those are details I just really don’t have time for anymore. But larger things as to making sure that we’re okay, it’s like, “What’s the performance of this?” And, “Overall, is something that can be adapted as the business needs change, and as we change? And as we learn, what can we do to modify it?” And more just things like guiding, and mentoring, and really taking a higher-level view of that.Corey: I’m going to selfishly ask about something that I struggle with myself. That goes a bit more into the technical area, but you talk about enforcing consistency across all of your different services. What does that mean? Similar coding style? Similar instrumentation?Because I look at the things I built and microservices that power my internal nonsense, and each one of those is very different than all the rest. So, whatever your version of consistency is, I know I’m not doing it. But how do you view it?Michael: So, there’s really two types of consistency. The one I really refer to the most is in observability. So that, if you’ve got a thousand Lambda functions out there, and each one is logging things slightly differently, that’s just a pain to deal with, and realistically, dealing with a thousand unicorns is a real pain. So, through that observability, at least in Lambda, we use an internally developed middleware to make sure that the logging is consistent, and it’s easy enough to use. And then other consistency, like, just within projects of how we lay things out.That’s something that’s been consistently evolving. What’s the folder structure in how we organize the code? And we’ve kind of been evolving that over the last three years. And within about the last six months, we’ve come up with a really good pattern and a template for the future. And it’s not much different from what we started out with, but it’s a little bit easier, really, to comprehend as a new engineer coming in. It makes more sense.Corey: I have to ask—and I understand if you don’t want to give a particular endorsement in any direction—but do you go through Serverless Framework, SAM CLI, the CDK, using the console and then lying about it? What is the template that you wind up using for that uniformity? Because even internally, I use three or four of those different things and professional advice: don’t do that.Michael: Let’s see. So, in our development, QA, production environments, infrastructure is all managed with Terraform. Each engineer has their own personal AWS account so that they can work on things there—Corey: Oh, that makes billing granularity super easy.Michael: Oh, yes. You can tell who’s got EC2 instances running up for too long. But for the most part, we’ll use Serverless Framework in that regard to say—for the engineer can just deploy into your local environment. Although we are working on ways to reuse the Terraform infrastructure and deploy that. But we have our own build and deployment pipeline that we built using CircleCI, and all of our Lambda functions are in Go.And so having to compile, say, 20 binaries in a service, that gets kind of slow, one of our DevOps engineers actually came up with a way to use Lambda to build the Lambdas, so that we can build them all in a distributed parallel fashion during the build process.Corey: One thing that I do love about the whole serverless approach—and it is a neat part about Lambda—is no two people ever seem to do it quite the same way. You can tie things together in so many different and exciting ways, and it’s fun. It’s almost like a modern version of playing with Lego. And I know that if Jeff Barr is listening, he just perked up at that. But I love the concept that you can take so many different ways to achieve similar outcomes. And it almost gives a bigger sense of creativity in how you approach problems. Has that been your experience?Michael: Oh, definitely. It’s not only the creativity; it’s also the flexibility in how you solve it, and the ability to adapt and evolve as services evolve, or change, or there’s new ones are added. And to the point of using AWS, kind of, saying, “Oh, using a Lambda function to do this.” Like, using Lambda functions for customizing behavior of Cognito with the Cognito triggers, is to me, I think, a perfect way to customize the service to do exactly what you need to do.Corey: I want to thank you so much for taking the time to speak with me today. It’s always appreciated. If people want to hear more about what you have to say and how you view these things or even, possibly, decide to work with you, okay can they find you?Michael: I’m somewhat active on LinkedIn. LinkedIn is the best place to find me. Please go ahead and connect to me; tell me you heard me on the podcast here.And yes, we are hiring. We have, all within our technical organization, from client, to web, and mobile engineers, data engineers, DevOps, API, we’re always hiring and if we don’t have something right now that fits your experience, let me know that you’re interested and I’ll put you on the list so that when we do have an opening, we’ll reach out right away.Corey: And we will, of course, include links to that in the [show notes 00:32:20]. Thank you so much for being so generous with your time. I appreciate it.Michael: Thanks for having me on, Corey. It was nice talking to you.Corey: Michael Garski, Director of Platform Engineering at Fender Musical Instruments. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment telling me that I’m almost certainly doing that chord incorrectly.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Links:CTO Whitepaper: Reinventing Enterprise Networks for the Cloud Erawww.alkira.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: If your familiar with Cloud Custodian, you’ll love Stacklet. Which is made by the same people who made Cloud Custodian, but put something useful on top of it so you don’t have to be a need to be a YAML expert to work with it. They’re hosting a webinar called “Governance as Code: The Guardrails for Cloud at Scale” because its a new paradigm that enables organizations to use code to manage and automate various aspects of governance. If you’re interested in exploring this you should absolutely make it a point to sign up, because they’re going to have people who know what they’re talking about—just kidding they’re going to have me talking about this. Its doing to be on Thursday, July 22nd at 1pm Eastern. To sign up visit snark.cloud/stackletwebinar and I’ll talk to you on Thursday, July 22nd. Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. On this promoted episode, we’re returning to something I did a while back on the AWS Morning Brief. I took us on a twelve-week exploration of networking in the cloud and how that wound up impacting how companies do business. Today, my guest is cloud networking evangelist Rasam Tooloee, and he works over at a company called Alkira. Rasam, thanks for joining me.Rasam: Thank you for having me, Corey. Pleasure.Corey: So, let’s start with the obvious. What is a cloud networking evangelist? I’ve heard of people evangelizing all kinds of things, some of which make more sense than others, but this is the first evangelism title that actually made me sit up and say, “Ooh, this is relevant to my interests.”Rasam: That’s funny. A cloud networking evangelist, to me—you know, what I consider my charter is really helping our customers and prospective customers understand that networking the way they’re used to doing it historically, if you look at legacy networking and the way that networking has evolved, specifically in the cloud, is just not sufficiently agile; it’s not sufficiently natively enterprise-grade from a visibility and control and compliance perspective. And that there’s just a better way of doing networking for this cloud era that we’re in. And I find enterprises that I talk to every day struggle with the complexity associated with how to get their properties into the cloud. There are many of them have become natively multi-cloud just by default, you know, some through business imperatives and priorities, some through acquisitions, but ultimately in the context of networking, that has led to a whole lot of complexities that they grapple with, and the evangelist in me is looking to help them find the better options that are out there for them.Corey: In the earlier days, before I got into Cloud, I was deep into configuration management. And oh, we manage all of these systems via configuration drift detection, and every time they run, they remediate the drift, and it’s great. Cool, so how do we wind up managing the networking equipment? Well, there’s this thing called RANCID. It’s made out of some horrifying Perl and if you turn on ‘strict,’ the whole thing breaks.And it was this awful sort of dark ages technology to approaching networking. It felt like the DevOps movement towards agility really didn’t come to networking in any meaningful sense for a while after that. Is that accurate? Or was I just hanging out in the wrong shops?Rasam: No, that’s absolutely accurate. For sure.Corey: Your career has been fascinating. You went from Cisco, where you presumably worked on networking because that’s kind of the thing they’re known for, then Salesforce, which is sort of definitionally SaaS, as says on the tin. You went to cloud with Microsoft for a while, and now you’re at Alkira, where you’re sort of in the perfect center of all three of those things. Tell me a little about how you got to where you are?Rasam: Yeah. Well, I have my roots in networking. I worked for Cisco—a great company—for a long time, and really got the opportunity to tackle networking for many different facets, both core networking as well as some of the advanced technologies that Cisco forayed into, and absolutely loved the ride and learned so much. And then there came a time where SaaS was clearly the next big wave. And being at Cisco, I watched that wave grow from afar, and at a certain point in my career, I decided to take the leap and go to the SaaS leader at the time, Salesforce, and really learn that business from the ground up, both in terms of the underlying constructs of how SaaS business model works, but also the core business that Salesforce has in CRM.And then from there, I transitioned to Microsoft because SaaS was the tip of the spear that launched the cloud revolution, but then there’s a lot more to cloud than just SaaS. And going to Microsoft really helped me to understand that business from the ground up. I’ve always had this inclination of really being intrigued and curious about what’s next and trying to take that next leap before I have the opportunity to really enjoy the uptrend of the ride. I’m looking for the next emerging significant innovation, but what’s interesting is, come full circle, I’m back in networking. Which kind of begs the question of, like, what happened?And to me, what happened was, in Alkira, I find the coming together of everything that is fascinating to me about cloud and SaaS with where I really earned my chops in technology, which is networking, and really solving this problem for this era, this moment in time in a truly innovative way. So, I feel like it’s actually—it may look like full circle, but it’s actually a continuous trend of seeking out the next innovative thing.Corey: Back when I wound up getting into networking, the reason I did it was because first, it was the 2008 financial crisis and no one was hiring; we just had a salary freeze and I was demoralized at my job. But I realized that as a systems administrator, I was always sort of hand waving over the networking pieces. And all right, let’s figure out how this whole thing works. So, I got my CCNA, my Cisco Certified Network Administrator, cert, back in the days when that didn’t have a whole bunch of different derivative adjectives after, telling you exactly what kind. And what happened next was that, okay, now I understand it a lot better.Every time I find myself basically scratching my head, trying to figure out exactly what the deal is with something that I’m working on technically, and I don’t understand what’s there, dig deeper into that and you’ll often discover that it makes everything else make a bit more sense. And then came cloud. And now we have cloud networking, and anyone who tells you they understand how cloud networking works is generally lying to you. It feels like it is complexity stacked on top of complexity, and these days, it more or less distills down to you fire up your cloud provider of choice, you click a few buttons in the console, and really hope you did it right. I’m guessing that you have not automated the clicking of the buttons in the console, so how do you folks approach it? What have you done that’s different?Rasam: So, to build on your point about the complexity of cloud networking, there’re a number of reasons why it is so cumbersome and so complex for enterprises to tackle the challenge of cloud networking. One is, it tends to be rather rudimentary in nature, and there’s a lot of manual effort involved, there’s hop-by-hop configuration, you have to do unnatural things to solve for some basic challenges. For example, we often find enterprises or service chaining firewalls so they can have symmetric traffic routing. And they will do things like have a separate path for ingress and egress traffic; the segmentation is extremely hard. So, one part of it is—probably my personal opinion; I don’t think cloud was built with the idea of let’s solve for the networking from the ground up because that’s so important to how people are going to have to manage their compute and storage.It was all about compute and storage, and networking was kind of an afterthought. And that shows. That it shows in the way that you actually have to configure your cloud network. Now, the other thing that really complicates it is the fundamentals of networking don’t change, but the way in which the fundamentals are applied and the vernacular that’s used to describe it and the UI that’s used to control it varies from cloud to cloud. So, just because you learn Azure doesn’t mean you know AWS, and doesn’t mean you know GCP.So, if you are becoming multi-cloud, now you got to go learn all this stuff separately, and then actually as an amplification of the challenge that is associated with the hop-by-hop configuration, when you bring up a region, for example, in cloud provider A, that doesn’t mean that you all of a sudden did most of the heavy lifting required for region B, you got to go do the same thing in region B.Corey: Oh, I had a client that had a very deeply skilled networking team that spent months without much success trying to get Terraform to set up IPsec between GCP and AWS at one point, and ultimately they gave up in disgust. My argument about, “Oh, we’re going to go multi-cloud to avoid locking.” “Well sorry, you already have lock-in, both in terms of what your staff’s up to speed on, but also the identity model, the security model, and critically, the networking approach.”Rasam: Yeah, absolutely. And back to your original question of how we do it differently. So, what we have done is really looked at the problem differently through a new way of thinking. Again, this goes back to my prior point about the network isn’t sufficiently agile, and the reason it’s not agile is for all the reasons that I explained. And when our founders who come from decades of experience in networking looked at this problem and they looked at the native value proposition of cloud—which in our mind is agility—is the fact that cloud is a competitive imperative these days, that’s where innovation is happening, and we see enterprises every day increase their investment in cloud because if they don’t, they’re going to get left behind; they’re going to get left behind because the next digital disrupter or their competitor is going to do, in cloud, the things that their customers expect, and the things that are required to truly compete in today’s marketplace.So, because it’s a competitive imperative, and at the heart of it is agility, our founders really contrasted what networking was like in the cloud and in the cloud era, which is really largely fragmented, and silos, highly complex, slow to deploy to your point, often CapEx heavy because you’re making substantial investments in things like colocation and dedicated bandwidth. There are a lot of delays and limitations like I talked about in terms of the various constructs between the cloud providers.They contrasted that with DevOps and said, “Okay, look. DevOps is all about automation. It’s about rapid iteration. It’s about abstracting the underlying complexity on an elastic platform that scales with you. You can actually go into cloud with minimum upfront investments, test and iterate, and then scale as you need to with velocity and agility. That’s what cloud is about. That’s the way DevOps has adapted to the constructs of cloud. But network isn’t the case, so how do we rethink networking from the ground up, so that it is more in line with the business imperative of why businesses go in the cloud in the first place?”And to do that, what they really did was design a unified fabric that’s a multi-cloud unified fabric that delivers a full stack of networking services that meet the vast majority of the use cases that an enterprise would have from a networking perspective, and does so in a way that’s natively multi-cloud, and does so in a way that natively addresses some of the complexities with things like security, compliance, visibility, control, et cetera.Corey: And I’ve been very vocal about opposing multi-cloud as a best practice, and people sometimes are surprised to discover that as soon as I find a customer who’s doing multi-cloud, I dive right into discussions about that, and, “We thought you were going to yell at us.” Look, do I think it’s a best practice in the general sense? No, but you have specific constraints, and you have an environment that is how it is, and sitting here saying, “Oh, you should have made a different series of decisions six years ago,” it turns out is not the most compelling story. And there are always specifics that override general guidance. So, whether I like multi-cloud or not as a guidance perspective, I don’t think that I can intelligently deny the reality that it very much exists in an awful lot of places.And sitting here just trying to be a purist by going through one cloud, whatever it happens to be, and nothing else doesn’t really solve any pain that customers have. Hybrid is and will be a big story for a long time. In my more cynical moments, I tend to view hybrid as, “Well, we tried to do an all-in cloud migration and got stuck halfway through because it turns out, it’s hard to move some things, so we gave up and called it hybrid and now we’re calling it good.” That might be overly cynical, but it takes time to move these things. It takes time to wind up wrapping around a bunch of different environments.So, if you have something that makes it a lot, I guess, more straightforward to rationalize about and around the network layer, that really feels like it’s a great equalizer because that is one of the most differentiated aspects of all the different clouds.Rasam: Yeah, absolutely. I mean, the proof is in the pudding, right? So, we find the challenge of getting to cloud, getting cloud networking enterprise-ready from a security, governance, compliance perspective, high availability perspective, disaster recovery perspective, to be a monumental challenge. And for an enterprise, it could be an effort of months, or years, sometimes, for a single cloud, much less a multi-cloud. And just because you did it with Cloud A doesn’t make Cloud B all that much easier.And I agree with you; I think multi-cloud isn’t necessarily an easy and desirable place to find yourself, but that’s besides the point because enterprises are finding themselves there for a myriad of reasons. It could be business imperatives, partnerships, acquisitions, it just happens. And when it happens, you need the best possible strategies and tools to deal with that. And for us the proof is in the pudding because we’ve had customers be able to contract the amount of time that it would have taken them to get from Cloud A to Cloud B from months and months to a matter of weeks. We can provision something that would take multiple weeks of change control and manual effort, and do it in a matter of hours.So, I don’t want to overstate how much the technology simplifies things, but the technology does literally simplify things that much. There’s still business process involved, there’s still change control involved, there’s still the human element of making sure that the change is well orchestrated, but the actual process of getting your cloud networking and multi-cloud networking up and running is simplified in a way that I think you have to see to believe and, you know, the proof is in the pudding, and when we have a chance to actually demonstrate that to our prospective customers, it truly is game-changing.Corey: It’s clear that you’ve built something that works. You have a laundry list of customers on your website who are referenced customers, and these are logos and names people recognize. It’s not, “Oh, wow. That sounds like you made half of those up, and weren’t three of those the big evil corporation in some movie somewhere?” No, these are real companies solving real problems.And digging a bit into what you’ve built before you came on the show, it is clear that you folks offer a TCO story that lowers the total cost of ownership, but lies, damn lies, and TCO analyses tend to be the three forms of lies people tell. I’m much more interested in the story of how you accelerate time-to-market because speaking as someone who focuses on AWS bills and cost reduction, it always takes a backseat to accelerating features being released. So, there’s a capability story that goes along with this, which it sounds like they’re very much is. That’s the real win; the fact that it saves money is almost icing on the cake.Rasam: Yeah, absolutely. You’re right, the Holy Grail is time-to-market, which really, time-to-market is very much for me, synonymous with this idea of agility and the ability to pivot, and to get to the next iterative desired outcome for your organization, whatever that may be, quickly. That’s consistent with this idea of velocity, and iterative testing, and scale that the cloud provides. For example, recently, I’ve been working with one of our prospective customers who’s, really, underlying challenge is, “Look, I’ve already built this really robust infrastructure from a cloud networking perspective. It is really colo-centric; that’s my model for my cloud interconnects, but I am now in a global expansion phase. I need to go to all these new geographies, and if I were to do what I just did to build out my cloud networking footprint, I’m looking at a substantial CapEx investment and a substantial amount of time and runway to get that operational, and I just don’t have the CapEx or the time for that.” So—Corey: What, they can’t just copy and paste the config from one to the other again and again and again in the true StackOverflow tradition?Rasam: Or get the circuits dropped in the colo, or get all that hardware delivered, and deal with all the complexities of international customs control, et cetera, et cetera. So, what we bring to them as a value proposition is the fact that our points of presence are virtual; they’re software-defined constructs that run atop the hyperscale cloud provider. We can spin them up anywhere in the world where the hyperscale cloud provider has a footprint, and we are in many regions across the globe. And if we’re not in one, we can get one up and running in a matter of days. And most of that time is actually just spent testing it to make sure that is operationally viable; the actual provisioning and turn up of it is very, very quick.So, the ability for us to be a virtual PoP for this particular customer and give them the ability to quickly expand into brand new geos in a way that also concurrently, natively streamlines and simplifies the complexities of cloud networking that we’ve already covered is extremely attractive to them. And from time-to-service perspective, it’s taking their ability to deliver the needed services in the cloud to their business users from something that would have taken months and months to something that can be up and running in a matter of weeks.Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.Corey: Can you give me an example of a customer pain point that you’ve resolved? Because, again, you have customers willing to say nice things about you, but one of the challenges I’ve often found with a lot of the, shall we say larger, more enterprise-y offerings is, “Well, what did you actually do for the customer?” And the answer requires two hours and at least 40 PowerPoint slides and at the end, you say you get it just to get the person to stop talking. What is the value, the better outcome that you’ve delivered for a customer?Rasam: Yeah, sure. So, our customer, Koch Industries—and they’re a public reference for us; you can check out their story more in-depth on our website—but they were your traditional enterprise, originally designed for cloud using a hub-and-spoke architecture, which consisted of using the data centers as the focal point for data center interconnect, cloud interconnect, high-speed bandwidth, private Lan, et cetera, that comprise their overall architecture. And over time, they simplified—and I use the word simplified loosely here—but they simplified with a more cloud-native, cloud-transit type of architecture, where they leveraged more of the default capabilities and networking services on cloud, which helped considerably. There was a ramp involved in learning the native-cloud constructs and associated networking and security aspects of that, but over time, they did simplify. They were able to condense their overall provisioning time of a cloud interconnect from what they originally shared with us was eighteen months down to about six, and consolidated across about a dozen transit hubs from a cloud networking perspective. But then, as we discussed previously in the podcast, when they took a step back and looked at it, what they still saw was an enormous level of complexity in networking, an enormous level of complexity in operations, and they still were seeking a better way, a way that was operationally viable in the long run with a lower total cost of ownership, and the ability to really consume networking services in a way that moved at the speed of business in a way that was more in line with the way that we’re using cloud computing storage, and in line with the speed and agility with which their business wanted to move. And that’s where Alkira came into the picture, and there was a real alignment of vision between how they saw their networking strategy moving forward and how Alkira delivered services. Long story short, they are now able to take their planning process down from six months to a matter of weeks, and the actual provisioning process of cloud networking to a matter of hours, sometimes less. And that has brought immense value to their business and to their IT organization, again, in terms of agility, in terms of total cost of ownership, in terms of visibility and control, in terms of governance. And another added benefit was historically they were single cloud, AWS, but in the process of their journey, with Alkira, the need came up to go into Azure for some Azure native services in a scenario where the data still resided inside of AWS, and that request historically would have been months and months of due diligence to get the environment up and running, and in their case, they were able to do that all within a day because they were already leveraging the Alkira multi-cloud platform.So, a tremendous amount of value for them across a myriad of fronts that, again, have been pivotal to their long-term strategy and how they address cloud networking moving forward.Corey: If we go back to the early days of cloud, we started off with some of the advanced stuff like, you know, virtual machines—some places called them instances—and there was a lot of competitive variation between them. “Well, these instances cost a fifth of what this other cloud providers do.” “Yes, but that other cloud provider [unintelligible 00:19:47] don’t fall over every 20 minutes and have persistent disk.” In the fullness of time, everything’s sort of commoditized to the point where now, in many cases, if you’re just running a bunch of virtual machines on cloud providers, it’s largely a matter of price. The same story has happened in many respects with object store. Do you think that the network will eventually wind up commoditizing as well, or do you think that there’s still going to be significant variances as the rest of the cloud world grows up on top of that bedrock foundation?Rasam: I think that’s a brilliant question, and I think the answer is yet undetermined. I don’t think it’s clear. I think there are a lot of different approaches to trying to solve for the challenge of networking in a cloud, first world, right? And most of the solutions on the market address some subset of the problem: some gets you to the edge of cloud, some really reside on the edge and try to interconnect you to the various clouds that you want to be in, some are meant to help you orchestrate your cloud footprint once you’re in the cloud. The underlying challenge remains that, at its core, cloud networking itself remains extremely complex and extremely siloed.If you zoom out and look at your traditional enterprise architecture, it’s a bunch of siloed solutions that have been stitched together to meet the end-to-end workflow. Well, cloud is kind of a microcosm of that. The same thing happens in cloud is, you have a lot of manual intervention of stitching together the various pieces to meet the end-to-end workflow. None of the existing approaches on the market are really operationalizing cloud from a networking perspective the way DevOps and containerization has done with compute and storage and made it really a seamless part of an end-to-end infrastructure as code strategy. So, I think everyone is really trying to tackle that problem in a way that hopefully, the end state will be one that is aligned with the underlying value proposition of what cloud brings to an enterprise.But how that is going to end up looking and whether or not it ends up being a singular sort of end-to-end infrastructure as code strategy that ties the pieces together elegantly, or ends up being all these various piece-parts that are solving a best of breed problem but still need to get stitched together, I think remains to be seen.Corey: One of the things that I think networking has had in common or is at least spiritually aligned with the world of security is that when it isn’t working, “Well, we’re going to go ahead and make things broader and broader and broader, and we’re going to go ahead and grant everything access to everything, and once we get it working, then we’re going to go back and dial that back down because we want to be secure.” Yeah, no one ever remembers to go back and dial things back down. Once it’s working, we’re on to the next ticket, in many cases. So, the complexity doesn’t just act as a drag on feature velocity; it also acts as significant security risk in many environments. How do you folks tackle that, or think about that? Or is that one of those, “Oh, that’s the best kind of problem: someone else’s.”Rasam: I think at the root of that problem is the visibility and control problem because it’s easy to do something, to turn some knobs to get something up and running and then forget about it. And if you don’t ever go and touch that part of your network again, then you can easily end up in a situation like the one that you described. And that’s why we really think of the idea of solving for this problem as needing a new paradigm and a new way of thinking, which is a unified fabric, end-to-end, in a multi-cloud world, with a full stack of network services that addresses the vast majority of the use cases that an enterprise would have. So, we’re literally giving you a single user interface for full visibility and control end-to-end for all of your networking use cases, be they on-prem, for your remote users, for your branches, or any of the clouds that you might be in.Corey: When you find that you’re talking to your prospective customers that, in the fullness of time, become actual customers, and they wind up going from, “Okay, this might work,” to, “This is awesome,” what do you find that they’re, first, the most surprised about during the adoption? And secondly, what do you think their biggest misunderstanding along the way was?Rasam: You know, the way that you leverage the Alkira Network Cloud—which is what we call it. We call it Alkira Network Cloud because it is in fact a network cloud that delivers all your full stack of network services in a cloud model. But the way you leverage the Alkira Network Cloud is you go through a multi-step, really simple workflow. So, we have this concept of cloud exchange points, which you can think of as virtual PoPs, and they reside all over the world. So, the first thing you do is you pick your virtual PoP or PoPs—you can have one or multiple of them, as many as you need—and the next thing you do is you attach your sites to this fabric.And there are multiple ways you can do that. You can do that through high-speed dedicated connectivity like AWS Direct Connect, you can do it by extending your SD-WAN fabric into the Alkira fabric, you can do it through IPsec connections, you could do it through remote access for your users. But that’s the first step. And then the next step is to attach your cloud VPCs or VNets. So, you go through a process of providing your credentials for your cloud properties, and you attach the cloud properties to the Alkira fabric, and in the middle, there’s the step of defining your segments.So, you define logically what your segments will be, and then you assign your sites, or your users, or your cloud properties to that segment. And literally, I mean, that’s five steps, and at the end of those five steps, you just established end-to-end multi-cloud connectivity from your sites, and branches, and data centers, and users to your cloud properties end-to-end with full visibility and control. And usually, that process can take 30 minutes, if you have all of your credentials and the necessary data lined up for what you’re connecting and the sequence that you want to go through, and at the end of that half-hour, people that are new to the platform will stop and say, “That’s it? We’re done? It can’t be that easy.” And in fact, it was that easy. And that’s really the big aha moment for a lot of our enterprise customers that see the platform for the first time of, like, “Wait. This is way, way different than anything I’ve seen before.”Corey: Your website has a 30-minute challenge for configuring a network, and I haven’t run myself through it yet with a stopwatch, but the fact that you can even make that claim means that there’s something radically different because frankly, it takes that long to find that the networking section of the console in many of the cloud providers. Something you just said was—talking about your enterprise clients; do you find that you’re generally working in the enterprise space, or do you tend to have offerings that make sense at the SMB scale? In other words, when is it time to start talking to you folks? Invariably, “After someone probably should have,” seems to be a common refrain, but at what scale does Akira begin to make sense?Rasam: Yeah. I think I use the term ‘enterprise’ sort of, more generically than your large enterprise.Corey: Oh, to me, a big company is anything with more than 200 people, so I’m the wrong person to ask on that score. But yeah.Rasam: Yeah, and I would say I agree with you, and that’s kind of the definition of when I say enterprise for me. Because networking is a horizontal problem. Every company needs networking and no matter what the size of your organization, if you’re going into cloud, you’re going to have to deal with the challenges of cloud and operationalizing the challenges of cloud. Now, the larger you are and the more clouds you’re in, the greater the complexity that you have to deal with and the greater the operationalization of that complexity. So, we deal with large enterprises that are deep into their cloud journey and find themselves back-ended into complexity and looking to simplify.And we also have enterprises that are born in—I’m sorry. When I say enterprise, I’m talking about customers that are born in the cloud, startups that are really looking for a simplified and operationally aligned networking solution with the way that they’re intending to leverage cloud. So really, if you’re getting into cloud, and you’re getting into cloud networking, and you have a cloud-first strategy, regardless of the size of your organization, the chances are pretty good that Alkira is going to be a good fit for you.Corey: Thank you so much for taking the time to speak with me today. If people want to learn more about what you’re up to, how you view these things or basically take it for a spin themselves where can they find you?Rasam: On alkira.com. So, www dot alkira—A-L-K-I-R-A dot com, and take a look at our resources page. It’s packed with great content. And like I said earlier, you really have to see this to believe it, so we’re happy to show you; request a demo and we’ll get online for you and take you through the journey.Corey: Excellent. Well, thank you so much for taking the time to speak with me. I really do appreciate your being so generous with your time.Rasam: Thank you, Corey. I really appreciate it.Corey: Rasam Tooloee, cloud networking evangelist. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment containing the proper Terraform configuration to get IPsec working between two different clouds.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About JasonJason Yee is Director of Advocacy at Gremlin where he helps companies build more resilient systems by learning from how they fail. He also leads the internal Chaos Engineering practices to make Gremlin more reliable. Previously, he worked at Datadog, O’Reilly Media, and MongoDB. His pandemic-coping activities include drinking whiskey, cooking everything in a waffle iron, and making craft chocolate.Links: Break Things On Purpose podcast: https://www.gremlin.com/podcast/ Twitter: https://twitter.com/gitbisect TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: Jason, thanks for joining me.Jason: Thanks for having me, Corey.Corey: So, you’re one of those people that we’ve always passed at conferences and other events, sort of like ships in the night. We hang out in group settings, but strangely, for whatever reason, despite traveling in the same circles for years now, we’ve never really sat down had an in-depth conversation with each other to the point where I feel like both of us are sort of wondering on some level, “Does he just not like me?” It’s been one of those items for me of, I want to catch up with Jason at some point and learn what makes him tick. And then pandemic happened. Well, no more. Thank you for talking to me.Jason: Yeah. And again, thanks for having me. I’ve always felt the same way. We’re always at these speaker dinners, or just hanging out with friends, and for some reason, I’m, like, at one end of the table, and you’re at the other. And we’ve just never had this opportunity.Corey: Exactly. Because you actually do a lot of good in the community, and I’m usually at the kids table. Which is, frankly, what happens, and honestly, it’s the right call. But you and I, I guess, are aligned in a few weird and interesting ways. And—well, let’s talk about what you do. You’re the Director of Advocacy at Gremlin. What is Gremlin, first off, and then what is a Director of Advocacy really do?Jason: So, Gremlin is a chaos engineering platform, or a reliability platform as we’re trying to sell it now. Because we started out doing chaos engineering, so some of the folks that were doing chaos engineering back at Netflix and back at Amazon, decided, most people aren’t Netflix, most people aren’t Amazon; let’s build something that everybody can use. So, Kolton and Forni, our founders, got together, they started this up. And the idea is really, how can we help people make things more reliable? And obviously, chaos engineering is one of those ways, so that’s what they started off with.And we’ve got a platform that really just makes that easy and safe to do. So, the second question about what is Director of Advocacy? I know you like to make fun of AWS naming, and I feel like it is sort of a weird, nonsense name because it doesn’t actually explain anything. But essentially, it’s developer relations. So, I have the task of talking to all sorts of folks who aren’t customers—really, just anybody in tech—about chaos engineering and why they should be doing it, and how to make applications and systems more reliable.And then, aside from that, I also get to interact with our customers and help them out. So, I’m a combination of customer success or success engineer slash support slash the advocate side is advocating for their needs within the organization. So, when they make a product request, I pass that on, see what we can do about that. So, it’s sort of a mishmash of all these different roles.Corey: I want to draw a bit of a parallel that DevRel slash advocacy slash evangelism universe to the sysadmin world where then we started calling ourselves DevOps and that led to an enormous schism around is DevOps a job title or not? “No, but it pays a lot better, so yes.” Then SRE. “Well, you’re not real, SRE,” and the rest. It comes down to quibbling over definition of terms instead of, you know, doing work. And I feel like, on some level, the whole DevRel space has, in some respects, gotten twisted around something that resembles the same axle. Is that unfair?Jason: No, that’s absolutely correct. There is that question of what is DevRel? How do you define it? And part of that is how do I justify my job? And on top of that, how did—at least pre-pandemic, how do I justify the company spending tens of thousands, if not hundreds of thousands of dollars, not only for my salary but to fly me around the world to get on stage and say things.Corey: Right. And it looks from a distance, an awful lot like, okay, you cost as much as an engineer, you don’t write any code to make what we do any better. Your expense budget is about the same as your salary in some cases, and then you travel far away to what looks like a giant party to hang out with your friends. And you get on stage and say, “I work at company X. Thanks. They’re great. Now, for the next 45 minutes, let’s talk about the right standing desk for you.” And it becomes a very difficult sell internally. And for a group that prides itself on advocating for its company. They don’t often seem to do as good of a job advocating for themselves, internally.Jason: Absolutely. There’s always the discussion of KPIs. How do we measure the impact of what developer evangelism, DevRel does? And it’s a hard thing, partly because every company is a little bit different. Because nobody’s really defined this, DevRel often is very fluid and just fills in the cracks of whatever a company needs.So, for some companies that might be doing support, right? I’ve heard people being called DevRel, and they literally are just on forums all day answering questions, or writing documentation, or speaking. So, it’s really just this nebulous thing of whatever a company needs.Corey: It becomes almost this weird expression, in some respects, of marketing. Of course, a lot of DevRel folks will scramble at the objection, “Oh, we are not in marketing.” And that’s always said with a very sneering tone towards marketing because those people are terrible. I argue that marketing is, A) wildly misunderstood, B) incredibly valuable, and C) where DevRel in many respects finds its spiritual home because it’s very hard to tie your marketing budget as a company to definable results and do attribution effectively, but there’s clear value to the company in things that can’t necessarily be measured, or at least not without a heck of a lot of work. That is the piece, in many respects, the DevRel is missing. But the first thing that they want to make clear is that we don’t work for marketing. It’s a very weird feeling.Jason: It’s very weird because as I explain that DevRel often is filling in the cracks and is very fluid, that’s because my personal perspective of DevRel is inclusive. I try to get involved in as many teams as I can, so I’m constantly working with engineering, and with marketing, and with customer success, and really everybody. And then on the flip side, you have people that define it by what it’s not. I’m not marketing, I’m not this. And you end up cutting yourself off.Corey: And neither are you an accountant, but I didn’t ask if you were, so yeah.Jason: But at the same time, you’re not an accountant, but you should have some sort of notion of what the finances of the company are because that gives you some sort of indication on whether you’re going to get laid off, for one, but also just for the success of the company. And I think maybe it’s just the engineering mindset that I’ve had from being an engineer of you take everything that and you try to learn everything that you can and put it together. And so, for me, that comes from having experience working in marketing, having experience working in engineering; how can I put these things that I know together to solve a problem? So, rather than saying, “I’m not marketing,” I’m going to ignore that because as you mentioned, marketing’s super valuable, especially the way that they’ve done data-driven marketing now. It used to be like madmen days, you’d throw up a billboard, and who knows if it works, but you paid a bunch of money for it. And now they’re so data-driven, and everything’s tracked. And, yeah, you may not be able to directly connect a few things, but you get a much better sense of where your value is, and where your time should be spent.Corey: Absolutely. And you can get—I don’t know—the 80% of the way there, and then the last 20% will drive you mad, so at some point, you just shrug, give up, and that’s okay. Similar in many respects to an AWS bill. It just becomes such a weird process to explore. And from a certain lens, when you have those cross-cutting functional types who are doing DevRel, they start to sound almost enthusiastic amateurs in the various disciplines that they bring together.“Yes, I’m an engineer, but not as deep on the engineering side, as some of my colleagues who do engineering 40 hours a week and then some.” “Oh, we're part of product.” But strangely, to work in product you usually have significant experience and training in how to conduct user experience studies and user interviews, whereas an awful lot of the DevRel input back to product is ‘word on the street style’ stuff.Jason: Yeah. And both are extremely valuable. It’s obviously very valuable to have that process of doing user studies and actually getting that hard data, but as we all know, that word on the street and what’s the general vibe of folks at a conference or folks at a meetup really informs things that usually doesn’t get asked in those formal user studies.Corey: Completely. And telling stories from my own world, back when I was, you know, having a real job and able to be fired by a whole bunch of different people—and was—there was the constant justification story of why should you go to that conference and speak? Why would we spend that money? Why shouldn’t it just be a personal thing that you take vacation for? Now that I own the company, it’s a different story because I know that when I go out and participate in the community, good things happen, but I don’t have the need anymore to justify it, other than to myself and possibly to my business partner.There are very real stories that I’ve looked at here where I go to a conference, I start talking to someone, we keep in touch, they wind up changing companies, we continue to talk, suddenly, they have an AWS bill problem, and now they become a customer. Yeah, it turns out that’s super hard to predict when you’re looking at flight prices to go to that conference in the first place. And there are many other conferences that nothing came out of it, I think, but you never really know.Jason: Yeah. One of the nice things about my job and one of the reasons that I joined Gremlin was the idea that chaos engineering is still pretty new. And so in my past experience with DevRel, it very much was your exact experience; how has what you said on stage or the introduction of our brand to an audience made an impact? And since chaos engineering has been so new, I’ve gotten to take a little bit of a step back from that. Obviously, I want people to get Gremlin or to try Gremlin, but even if folks just try chaos engineering and have a better understanding of it, that’s a big goal of my job. That means that I win if you try chaos engineering, even if that’s with an open-source tool. So, that’s one of the reasons that I’m super happy about where I’m at right now in terms of DevRel is, I get to be DevRel for an entire practice, rather than just a company.Corey: And, on some level, you get to define what success and failure looks like among your team. But turn it around for a second; how do you wind up articulating the value and story of what you do to the larger business? Because I’ve seen the approach if you can’t measure DevRel that way—regardless of what that way is—and it’s always this, don’t ask us for metrics. Don’t ask us to really, functionally, be accountable for much. And from a business strategic point of view, where you’re not deeply involved with aspects of what that leads to, “Okay, so it rounds to zero, and wow, I’m spending an awful lot of money on something that doesn’t really add any value. I could spend that money on things that do instead.” And then you see a bunch of negative things happen. Like, as soon as there’s a layoff or a downturn, that entire group winds up getting decimated in some cases, even when, in reality, that’s the thing that should be invested in the most.Jason: Absolutely, yeah. One of the things that I’ve always loved is people talk about metrics. And yes, we definitely get that from the marketing side. And so I do have metrics on things like how many workshops we run. And those people are obviously, we capture those leads, they go through the marketing funnel, et cetera, et cetera.But then there’s the idea of how many engineers out there have those same metrics? We always complain about you shouldn’t count the number of lines of code because that’s stupid. You shouldn’t count all these other things. But generally, most engineering teams are working off of quarterly OKRs or some sort of time period, what those goals are and the product that they’re going to ship. And so I’ve tried to adopt the same thing in every DevRel organization that I’ve been in, is what are the high-level goals?And if you can get leadership to buy off on those, for example, we’re currently working on an online learning platform. We don’t have tight metrics about how many people should be registered and complete the course and be certified yadda, yadda, but we have a good sense that if we build this, it’s going to be very beneficial in a number of ways. And leadership agrees, and they’ve bought off on that, and they’ve signed their names to it. And so for us, what does success look like in terms of this is actually implementing that and shipping it.Corey: It’s a really strange and really powerful thing, but you take a look at so many different companies who have done well and companies that haven’t done well, and the way that they engage not just with the ecosystem, but with the community specifically, in many cases seems to be the path that it follows. I mean, not to pick on them unnecessarily, but Chef had a wonderful community; they engaged absolutely flawlessly, from what I could tell, even when I didn’t agree with people or particularly like them in some cases, the people who worked at Chef almost demanded respect, and it was pretty clear, even as someone who didn’t use it myself, that they were a force to be reckoned with. And then they wind up effectively losing a lot of the people that made it special, the community moved on, they sold it to a company no one had ever heard of, and now it’s one of those, oof, they deserved a better end. Maybe that’s unfair, but that is the perception.Jason: Yeah, I would say the same thing sort of happened with Puppet, the idea that they built a nice community, and back to my point of, like, you have a project, you work on shipping that, you don’t really track those numbers. That’s what I saw from both communities Chef and Puppet is they had these strong communities, they were doing things, and the goal was the community. And I don’t know—I haven’t talked to Nathan, I haven’t talked to folks at Puppet, but I suspect that they weren’t simply about how many people—like, what’s the total number of people that we would say are in our community? There was a value on, we want to do this thing and we have a sense of the quality of the community, and how much people just are engaged, and interested, and want to help each other.Corey: The piece that also gets lost as well is companies are out there to turn a profit. And building a vibrant open-source community who loves your open-source offering but aren’t in a position to either champion or purchase the thing is often viewed as a complete waste of time by the business. So, they in turn, then pivot business models and do things that insult or alienate the community, and suddenly are perplexed by the massive groundswell of negative publicity they get, of people actively advocating that companies not use them. And their position is somewhat understandable in a form of, “What the hell is this? You weren’t spending money on us before. Now, you’re still not spending money on us, but you hate us. What gives?” Community is a weird thing to wrap your arms around.Jason: Absolutely. I would say it’s hard to wrap your arms around it when you’re not valuing the relationship. It’s like any relationship where you have ulterior motives. If you can’t actually connect with people, it’s never going to go right.Corey: No. And it also can’t be self-serving, or seem to be self-serving—spoiler, the best way to make sure you’re not perceived a certain way is to not actually be that way—we take a look at Last Week in AWS, my newsletter, it is explicitly aimed at people who want to keep up with what’s going on in the world of AWS, which is fair. It is not aimed at people who have a big AWS bill and don’t know what to do about it. And sure I reference periodically in that newsletter what I do, but it’s not a sales piece. It’s not every week hammering home, buy whatever it is I’m selling because that’s how you alienate and lose the audience.I’ve always felt that by being top-of-mind for the problem and reminding people I exist every week with something that’s useful and ideally a bit funny, then, when they have that expensive problem, they’ll think of me. That was my theory four years ago, and I’m still here, so apparently, it wasn’t completely off base.Jason: Yeah, well, that works, right, because nobody wants to subscribe to a newsletter to hear about the service. If they knew they needed your service, they would just buy your service. So, what’s the value of the newsletter? What’s the value that you’re offering to people? And that is, well, the fact that there’s so much freaking news about AWS every week that it does require a newsletter.Similarly for me, what’s the value? Well, if people knew that they needed Gremlin, they would just come talk to me. But they don’t. They were concerned about the needs that they have, about how do I build a more reliable application, “My stuff’s always breaking. I’m having too many incidents. I’ve done everything that I can think of. What’s next.” So, it’s just offering that.Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.Corey: And let’s be very clear here, you have a much harder challenge than I do. Because it turns out that you don’t need to be deep into the weeds of corporate finance, to understand the concept of wasting money on the AWS bill might not be the best thing in the world. Once you get more into the nuances, you start to realize, “Oh, being able to predict the AWS bill sounds super awesome, too.” But none of those are a particularly heavy lift, whereas, “Wow, your site is crappy and falls over a lot. Have you considered breaking it on purpose?” Sounds deranged the first time someone hears it.Jason: Absolutely, yeah. That’s the number one thing that I hear all the time is—and people joke about it. I don’t need chaos engineering; I do regular deploys.Corey: That sounds almost like someone was sitting in a blameless post mortem and got carried away trying to keep it blameless because otherwise, it was going to be their fault, and accidentally invented entire field.Jason: Yeah, yeah. I mean, it’s definitely blameless if everybody is causing things to break; then we all share the blame. It is a funny thing. It’s a tricky thing to sell the people and I think it’s tricky because we have these misconceptions about what that actually means, the idea of breaking things on purpose. And trying to move away from that because the breaking really isn’t the goal.And oftentimes, they’re not actually even breaking things; you’re stressing them out or you’re simulating things, so nothing’s really broken. But once you start thinking of it as that idea of I’m going to test my assumptions, right? I think that things work this way, but I don’t know, I’m not super confident that it actually will do that. And we do that all the time when we’re developing applications or infrastructure. I set things up, I’m pretty sure that it’s going to work a certain way.Documentation says that this app works this way. Does it actually do that? Well, I can either find out when it doesn’t do that at some random point, or I can actually try to force it to act in that way, or to encounter that bad environment that I’m a little suspect about. And so we do this all the time with other things. And oftentimes, we’ll do this just mentally as, “What would happen if—” and you kind of play it out in your mind.And that’s actually a great way to start with chaos engineering, rather than actually doing it, just that mental game. “What do you think would happen if this goes wrong?” Play that out in your head? Cool. Once you’re comfortable with that you’re like, I think this is what my next steps would be. I’m pretty sure there’s documentation here, or I’ve gone and checked and assured that there’s docs, or run books, or whatever, why not give it a try?Corey: It’s one of those areas where what have you got to lose? I mean, as you just said, your site breaks all the time anyway, before you even touch it’s stability, what happens if the database just suddenly increases latency through the roof? What happens if suddenly all of us-east-1 is hard down? In many cases the answer is, we don’t really care about our website anymore because the world is not going to care about the internet not working that day, in the context of what we do. In other shops, yeah, that matters, and we kind of still need the power grid to work.So, there’s a definite question of what failure modes are worth planning for and what aren’t, but even going through that exercise is fantastic. I used to do things like that from a sysadmin perspective, asking companies when I was asked to build out a mail server. “Great, how much downtime is acceptable?” And they said, “Absolutely none.” I said, “Great. I’ll need a budget of $20 billion to start, and when that runs out, I’ll come back for more.” And they said, “Wait, what are you talking about?”And we said, “Oh, now we’re negotiating with the business.” And it turned out what they really meant was, “It would be nice if the mail server worked during business hours most of the time.” And, “Oh, okay. I can do that for slightly less.” And it really just came down to what do you value? What is important to your business?Jason: Yeah. How much reliability do you need? Although one of the key things that I always point out is, a lot of times people are like, “Oh, you don’t need 99.9% reliability; you could probably get by with less than 90 because people aren’t using your application at night, they’re not using it on the weekends, yadda, yadda.” The other problem with that, though, is you rarely control when those outages happen.So sure, if it happens in the middle of the night, and nobody’s using it, great. Just keep sleeping. As you start to work on this, though, there is the idea of it could happen at any time, so let’s actually test things to ensure that if it happens at the least opportune time, things actually work the way that we expect.Corey: And that’s an incredibly valuable thing. See, you’re already convincing me on this. And clearly, you’re very effective at that advocacy role. How do you hire and how do you determine who’s a great fit? Because I’m imagining that bringing someone in, in an advocate role, and their position being, “Oh, at no point, can you ever measure me on any context, and just assume that what I’m doing is amazing and great.”That becomes a hard thing to do. When I was talking to companies about possibly doing evangelist style roles, years ago, I asked, “How will you know if I’m being successful in this job?” And one of the answers was, “Well, you speak at a certain number of tier-one conferences a year.” “Cool, what are those?” And, they listed off a bunch and cool, there’s only one in that list that I’m not scheduled to speak at this year, so do I get a raise?People try and aim at the wrong thing in their quest to articulate what they really value, but what they really value is hard to measure. So, how do you evaluate people on a basis of are they doing what they should be doing, or are there ways that they can be coached to improve, or are they just not effective in the role at all?Jason: Yeah. Well, I think you mentioned two great things, are they doing what they’re supposed to be doing? And it comes back to every quarter, we’re laying out the goals of what do we want to accomplish this quarter? And we make them achievable, so hopefully, by the end of the quarter, you’ve achieved this thing that not only the team, but senior leadership has decided is a good thing for the company. And to that point, if it’s not, if we do that thing and nothing happens, and it’s—or it’s bad for the company, at least we can say, “Hey, senior leadership, you are the people that thought this was a good idea, too.” But that said, we try not to do the blame. We try to iterate on things and experiment a lot. Especially at Gremlin, we’re all about experimentation, so we’re constantly trying things. But ultimately, it’s are you getting this thing done that we’ve agreed that we’re going to get done?But you also mentioned that second thing about growth. I think that’s something that I always look for with anybody, whether that’s DevRel or engineering. I want people that are interested enough in the job that they want to do it well. There’s something about it that they really love or they’re really into, and they want to master that. And so part of my goal as a leader is trying to help people along that path of what do you find interesting? For example, last year, we were working on those tiers, as we’re trying to figure out what does it actually look like. Because we’re really small team at Gremlin, and so as I’m starting to consider how do I promote people?What are the various, like, levels or tiers of going from an advocate, to a senior advocate, to whatever is beyond that? So, I asked the team, really, “What do you think that would look like? What do you think the next level for your career is? What is the thing that you want to master?” Because ultimately, people have more investment when they’re choosing their destination and they’re choosing their direction.And so if I can help people do that, just define what’s the next thing that you want to tackle? What do you think mastery or the next level of your career looks like? How can we help you get there? So, that’s what I am for.Corey: For better or worse, it seems to be working. I remember back when Gremlin was a rando startup idea a couple people had and now I’m starting to see you folks, basically everywhere.Jason: Yeah. Again, we’ve got a small team, but it’s a great team. So, Ana Medina has been on the team, actually, before I joined, but she’s been doing a fantastic job and she has been working on a lot of our educational outreach. And then Pat Higgins on the team actually started on the engineering side. So, he was one of our front-end engineers; he’s been working on a lot of really great tools.He helped me restart the Break Things On Purpose podcast. So, we’re into season two of that now—and by the way, we should have you on that show as well. But yeah, we’re doing a lot of fun stuff, and folks are happy. So, try to keep them challenged, and we’ll see what’s next.Corey: Yeah, I’m really looking forward to seeing how the story continues to evolve. It’s a fascinating field that went from, “That is ridiculous,” to, “Oh, that’s great but it would not apply to what I do,” to, in my case, it actually would not help me in any way with what I do because it turns out, well, what if an AWS region goes down and you can’t produce your newsletter the usual way? Oh, I’ll write it by hand that way because suddenly I have a much bigger story to talk about that week.Jason: I am curious, though, speaking of having you on the podcast. Oftentimes, we talk about reliability, and having never had to deal with AWS bills because they always go to somebody else in finance, I am curious how reliability ties into the cost of what you’re paying for AWS? Because I can imagine things like—a common thing that we hear about is, “I’m moving a lot of stuff to Lambdas.” Like, great. Serverless. It’s cool, it’s hot. How is that charged?Corey: Right.Jason: Obviously, by time.Corey: Oh, yeah.Jason: So, if it’s charged by how long something takes, what if your latency goes up? What if your resources are constrained? How does this actually affect things? And how does that impact how you think about reliability not just from a is it up or down? How’s my customer looking at it? But maybe from what your AWS bill looks like?Corey: I love where you’re going with that. And it’s the conversations everyone loves to have as about three levels beyond where most companies actually are. Easy example that sounds like something in the distant past, but it’s very real today: I want to store data in multiple availability zones for durability purposes and making sure that we are reliably up. Well, every time a gigabyte crosses an availability zone boundary, that cost two cents. And then you have to pay to store it twice.So, there’s a question of how much is having multiple sets of that data worth? And the cloud-native answer to that is, “Oh, put it in S3. There’s no cross-charges there. Their durability is ridiculous, and you can access it a whole bunch of different ways, provided your application supports it.” But that’s not a fit for everything.And you find that saving money, and being reliable, are at some point completely at odds with each other. And this is incidentally, why we don’t do this as a tool, we do it as a consulting engagement. There are times where, for business purposes, you will want to spend more on reliability. Because saving money that accidentally takes your company down for a month is not money you should be saving.Jason: Yeah.Corey: Now, the real fun thing I want to see from Gremlin one of these days from a implementation perspective is, just for fun, we’re going to run a chaos injection experiment where we decide to cancel the credit card tied to the account and then also remove the increasingly frantic alerts from your email when that happens, and see how long it takes you to realize the giant single point of failure that no one really thinks about existing, but absolutely does.Jason: So, I am curious, for folks that are listening who are engaged with the chaos engineering community, or at least follow Corey’s newsletter and have seen updates, AWS has announced their own chaos engineering tool, the Fault Injection Simulator, which to Coreys skill of poorly named things, that actually isn’t a simulator. It does inject real faults, so it may be—S should be service. One of their faults, though, that they can do is API throttling, which essentially could simulate the idea of, you haven’t paid your bill; we’re turning things off. So, Gremlin is working with the AWS folks, we’re trying to figure out great ways that we can work together so that people can use both Gremlin and AWS FIS. So, I’ll let you know if that becomes a thing, and maybe we can get some API access to billing as well.Corey: I’d love to see it. Please keep me looped in. Thanks so much for taking the time to basically go all over the world of DevRel and probably make some lifelong enemies in the process. If people want to hear more about what you have to say, where can they find you?Jason: Yeah, I’m on Twitter. My Twitter handle is @gitbisect—and by the way, if anybody tweets about Git bisect, it is a fantastic tool, fantastic utility within Git—oftentimes, I will respond. But that’s where to find me on Twitter. Otherwise, you can find me on [unintelligible 00:31:30] podcast, Break Things On Purpose. It’s available in all the platforms.Corey: Excellent. We will, of course, put links to that in the [show notes 00:31:37]. Thanks so much for taking the time to speak with me. I really do appreciate it.Jason: Yeah, thanks, again. It’s been long overdue, and I’m glad we finally made it happen.Corey: Awesome. Jason Yee, Director of Advocacy at Gremlin, I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment saying that the best thing to test breaking in production is your DevRel team.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.This has been a HumblePod production. Stay humble.
About CassidyCassidy is a Principal Developer Experience Engineer at Netlify. She's worked for several other places, including CodePen, Amazon, and Venmo, and she's had the honor of working with various non-profits, including cKeys and Hacker Fund as their Director of Outreach. She's active in the developer community, and one of Glamour Magazine's 35 Women Under 35 Changing the Tech Industry and LinkedIn's Top Professionals 35 & Under. As an avid speaker, Cassidy has participated in several events including the Grace Hopper Celebration for Women in Computing, TEDx, the United Nations, and dozens of other technical events. She wants to inspire generations of STEM students to be the best they can be, and her favorite quote is from Helen Keller: "One can never consent to creep when one feels an impulse to soar." She loves mechanical keyboards and karaoke.Links: Netlify: https://www.netlify.com/ TikTok: https://www.tiktok.com/@cassidoo Newsletter: https://cassidoo.co/newsletter/ Scrimba: https://scrimba.com/teachers/cassidoo Udemy: https://www.udemy.com/user/cassidywilliams/ Skillshare: https://www.skillshare.com/user/cassidoo O’Reilly: https://www.oreilly.com/pub/au/6339 Personal website: https://cassidoo.co Twitter: https://twitter.com/cassidoo GitHub: https://github.com/cassidoo CodePen: https://codepen.io/cassidoo/ LinkedIn: https://www.linkedin.com/in/cassidoo TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: I’m Corey Quinn. I’m joined this week by Cassidy Williams, principal developer experience engineer at Netlify. Cassidy, thanks for joining me.Cassidy: Thanks for having me.Corey: So, you’re famous in many circles for things that have nothing to do with your actual job. Or at least that’s the perception. So, let’s at least start there because I’m not sure we’ll get back to it. What is Netlify? And what does a principal developer experience engineer do at such a place?Cassidy: Yeah, so the shortest answer is, it’s a place where you can host your website. The longer answer is it’s a whole development workflow. You can build whatever types of complex websites that you want, and we make it very easy to get it up and running. And my job there is on the developer experience team. And basically, what we do is we are developer experience engineers. We try to build things and show developers how to make their apps, their websites, their various products, and projects easier to build on Netlify.Corey: Sort of the whole idea of what I used to think of, I guess, as static websites and various ways to host it, which I think is now called Jamstack. But that probably also misses a fair bit of nuance because I’m going to be completely transparent here: I am crap at all things frontend.Cassidy: It takes all kinds to make a project work. Yeah, so it is more than static. I like to think of it more as static first. The way I’ve defined Jamstack, that kind of clicks with most people is, writing Jamstack—and for those who don’t know, it initially was an acronym, where it was:, JavaScript, APIs, and Markup stack. And so, it’s less about technologies and more about the philosophy of building websites.But the philosophy of it is, it’s kind of like building mobile applications, but in the browser, where you try to build as much as you can upfront, and then pull data in as needed. Because in a mobile application, when you have something native, you don’t, server-side, render the UI every single time. The UI is built pretty—Corey: Well, not with that attitude anyway.Cassidy: [laugh]. That’s true. That’s true. But when you’re on a mobile app, you don’t normally pull in the UI every single time. It’s built-in, and then you pull in data as needed; sometimes it’s local, sometimes it’s on a server somewhere. And that’s what Jamstack is all about. It’s building as much as you can upfront and then pulling in data as needed.Corey: The idea is incredibly compelling, and it gets at a emerging trend that I don’t think that there’s any escaping, and—maybe this is overblown, I’d love to get your feedback on it—I can’t shake the feeling that JavaScript is the future—not necessarily a frontend—in general, when it comes to, effectively, computers. We’re seeing it on the backend, we’re seeing it on the frontend, the major cloud providers are all moving in a direction of approaching folks who have JavaScript experience, and that’s the only certainty in that persona that they wind up identifying. It is very clearly not going away while getting more capable. Is that fair? Is that missing something? What’s the deal there?Cassidy: I keep hearing there’s, like, a rule that people are saying, like, “If it can be built in JavaScript, it will,” because I think it started as kind of this toy language that people didn’t really take seriously. But it has not only become more powerful, but also browsers have become more powerful too, and you can just build more and more with it. And because it’s kind of a low barrier-to-entry language, it’s relatively simple to at least initially learn JavaScript before you get into all the nuances of everything, that I think, just because there are more people using it and it’s easier and faster to pick up then something like assembly or C++ or something. I hesitate to make generalizations because you never know, but it does feel like that sometimes, that JavaScript is just the way that things are going.Corey: And I admit, a couple of times I have tried to get into the JavaScript world, and it isn’t clicking for me. My lingua franca is crappy Python. And it’s just crappy enough to run, but it’s neither elegant nor well-designed. It is also barely functional. And every time I have brought in an actual developer to turn some of my scripts into something a bit more robust, they ask me what it does, they smile and nod a lot and never take their eyes off me for a second, and then immediately get rid of everything I might possibly have touched.This is, of course, a best practice where I’m involved. But it runs. Like, “This is the worst code I’ve ever run.” “Ah, yes, but it does run.” The problem I have with JavaScript is that I do not understand it. The idea of asynchronous calls on a browser completely melt my brain whenever I look at it.That’s caused a few of my early naive mistakes where, “Oh, go ahead and set this value and then use it here down below, and—wait. Why is it completing before it has that value and it’s not you—what is going on here?” And now I understand the general principles of it, but I’m still getting lost and confused in the weeds. Now, is this just another expression of being secretly terrible? Or is there a nuance here that I’m not picking up on?Cassidy: I was smiling the entire time you were saying this because I feel like almost everybody who is new to JavaScript coming from another language has had the exact same issues. So, you’re not alone, and you’re not a total idiot. [laugh].Corey: So, I decided that it was time to learn it the second time, and I—all right, I’m going to break my own rule, which is the way I normally learn something new is I’ll dive into it and start building something and then we’ll see what happens. Sure, it means I’m a full stack overflow developer, and my primary IDE is copying and pasting, but I can get something sort of functional that works. That approach wasn’t working for me, so what I did on my second attempt was odd. I’m going to go actually do the unthinkable for me, and read some documentation and/or some tutorials.And I was almost immediately blown off course there because suddenly, I find myself just wandering onto what I can only describe as a battlefield between all of the different frameworks I could have chosen between, and it seemed like the winning move was not to play. What am I missing? Are these frameworks hard requirements for doing anything that even remotely resembles frontend in a responsible way? Are they nice-to-haves? Is it effectively an aside current debate that I got suckered into and lost the forest for the trees?Cassidy: You probably got sucked into many debates because there are so many in this world, I do not think you need a framework to do complex web apps or any web apps. I mean, my personal website, as much as I love React—and I’m deep in the React world—I did that with vanilla HTML, CSS, and JavaScript, and that’s all it is. And plenty of the projects that I do, I start with vanilla, and then I add React as needed. I think it’s something where these frameworks, you don’t need them, but it’s really nice once you start building large applications where you don’t want to reinvent the wheel. Because there have been plenty of times on my own projects on other projects, where I start to basically start implementing state-driven components, and trying to parse templates and stuff that I end up making for myself. Where if I did React, I probably wouldn’t need to actually implement all of those. And so you don’t need these frameworks. That being said, they can be very helpful as you make more complex projects.Corey: So, I periodically post an architectural diagram of the pipeline slash workflow thing I use to write my newsletter every week. And I was on the verge of just hiring a frontend developer to build something frontend because it turns out that there’s not a great experience in using a whole bunch of shell scripts that require a CLI to post at random API endpoints. And then a discovered Retool, which is one of those low-code tools that more or less is Visual Basic for frontend. It was transformative because suddenly, it’s, “Oh. When I click this button, make this query that hits some API that I can define,” and oh my stars. It was transformative, and I was actively annoyed I hadn’t discovered it years ago.Cassidy: [laugh]. Yeah, all of those low-code tools for web devs, they’ve been growing, that is a really interesting realm of the web that I’m curious about. I’ve played around with quite a few of them, and some of them, I kind of end up just wishing that I built it myself in the first place, and then for some of the others, I’m like, “You know, this saved me some time.” And yeah, I think those things are really, really powerful. I don’t know if they’ll ever fully replace having an actual developer, but for a lot of individual smaller tasks, it’s really nice to not have to, again, reinvent the wheel.Corey: And you’re right. These tools are getting more capable. The problem I have is, whenever I talk to the teams building these things, they’re super excited about them and can’t wait to show them off. And then I say, “Just a quick question. Of all your engineers here, how many of them don’t know JavaScript?”And the answer is always the same. None of them? Great. Yeah. Now, there’s an opportunity to present this to existing frontend developers so they can get back to what they were doing when they build a quick internal tool for someone else in a business unit, but there’s an entire untapped market of people like me who don’t understand JavaScript. So, when we see these things described in JavaScript context, it looks like it’s not for us, even though it very much is. There’s something to be said for making things accessible to an audience that would benefit from them.Cassidy: Yeah. I’ve actually given a few talks where it’s geared towards a backend developer who might want to dip their toe into frontend but have no idea where to start. And that is a whole world of people who are like you who just don’t understand the DOM in the browser, and how the interactions happen, and how the async await stuff works, and how promises work and everything. And they’re very weird concepts that just aren’t in other parts of programming, typically. And I think that’s a marketing problem where a lot of these low-code tools or no-code tools don’t understand the opportunity that’s available to them.Corey: I think that there’s a misunderstanding in many respects, where I’ve also seen a fair bit of, I can only call it technical bigotry, I guess, is the best framing here of, “Oh, where frontend is easy, and backend is the hard stuff, and that’s really where it’s at.” And having worked with qualified teams on both sides and looking at all the intricacies on both sides, where the hell does that come from?Cassidy: You know, I think it just comes from the past.Corey: So, do I. And I don’t agree with it. It’s just such a misunderstanding and a trivialization of such a valuable area of things. It kills me every time I see it.Cassidy: Yeah, it’s frustrating, I admit, because I’ve faced that a lot in my career. I actually—I used to do backend. I used to do Python stuff, and I have a computer science degree, but plenty of times, there’s some kind of backend dev who’s just like, “Eh, well, I know HTML and CSS, so I know frontend.” And that’s about it. Or they’ll say, “Well, do you really need to know this kind of algorithm or this way of doing things in an optimized way because you’re just putting a pretty face on the data that we’re producing for you.”And it’s an annoying sentiment. And I really think that it’s just from a previous time because a long time ago, from five to seven to ten years ago, that might have been more true because we didn’t have some of these frameworks that do a lot on the frontend. And we didn’t have things like GraphQL, and really powerful tools on the frontend. Where back then, it was a lot of the backend doing stuff, and then the frontend making it look good. But now the work is distributed a bit more where our backend teams, I can say, “Build however you want. You can change your language to Rust, to Go, to whatever, do whatever you want; as long as the data is exposed to me, I can use it and run with it.”And then all the routing ends up happening on the frontend, all of the management of that data happens on the frontend, all of the organization and optimizing for the browser happens on the frontend. And so I think both sides have been empowered in recent years in that regard because, again, with that modularity, you can scale a lot better, but those lingering sentiments are still there. And they’re annoying, but unfortunately, we’ve got to live with them sometimes.Corey: So, let’s talk as well about, I guess, sort of the elephant in the room. Your Twitter feed is one of the most obnoxious parts of my day, specifically because every time you post something I am incredibly envious about the insight it provides, the humor inherent in it. “I wish I had thought to go in that direction,” is almost always my immediate response. And, ugh, it kills me. Let’s talk a little bit about that. How did it start? And how is it continuing?Cassidy: That’s a good question. So, I’ve always been a bit of a clown, both on and off the internet, but I was never very, very public about it, for a while there. Either that or just had a small audience and people were just like, “There she goes again. Maybe she’ll shut up someday.” And so I’ve always had those little drops of humor where I can because I think I’m amusing myself at least.But about a year and a half ago, I discovered TikTok. And with TikTok, basically, it has such a good video editor—that was the only reason why I got the app because it made it so easy to make videos on my phone—where I was able to suddenly not just type my tweet jokes and my snarky humor, I could make a video about it, I could add music to it, I could make a dumb face. And people seem to like it, and it’s worked out.And I try to approach things rather from a realistic or educational perspective first and then drop in the humor later, I don’t try to lead with the joke, but at the same time, it’s always fun to have a joke in there because people like to say, “Oh, something funny is happening. I’m getting ready for it.” And it’s kind of fun that I’m able to do that a lot more now that people actually expect humor. [laugh].Corey: When I was an employee—which I was, let’s be very clear here, terrible at. There is no denying that—it was always a problem for me where the biggest fear that anyone had—start to finish—was that I would open my mouth and say something. And credit where due, my last job was at a large finance company. And at that point, they’re under such scrutiny that anytime someone opens their mouth on anything, it has the potential to trigger an SEC investigation, and no one knows what I’m going to say. Yeah, there’s a lot of validity and being concerned about that.I felt like I couldn’t ever just shoot my mouth off and be me. And I always had this approach of, no company in the world would ever be willing to tolerate my shenanigans, therefore, I should never look to either do these things in public or later, go to be an employee again. You’re living proof that it is in fact possible to have both.Cassidy: Yeah. It brings a levity to our very serious industry—I used to be in FinTech; I know how serious that can be—but then just in tech in general, a lot of tech people take themselves way too seriously. And I understand we’re doing awesome work. Some people think they’re gods because they can think something and make it into an app. There’s ego there, but I feel like making fun of the problems, pointing out the problems in the industry and, kind of, just making light of it and making certain tech jokes and making certain concepts humorous as well as educational, I think bringing that approach to things is just really, really effective.And I’m really happy to be on my team, honestly, at Netlify because a bunch of them are just dorks [laugh] where pretty much every single meeting, we try to make it a little bit fun. And it makes our meeting so much more enjoyable and productive because we’re not just seriously staring at our screens and saying, “Okay, let’s make this decision for our OKRs,” or anything like that. We have a good time in these meetings while being productive, and it makes for a really nice team dynamic. And I think there should be more of that, in general, in tech.Corey: One of the things that you have always done with your platform that I am, I guess, slowly warming up to is that you’re never mean, or in the rare occasions where you punch at something, it’s a dynamic; it’s not a company and it’s not a person. I have a strong rule of not punching at people, but large companies have always been fair game from my perspective. And that is a mixed bag. Yours is—how to put this—unrelentingly positive where it’s always about building people up, and shining a light on things that used to be confusing, and reminding people that they’re not alone in being confused by those things. And that’s no small thing.Cassidy: Yeah. I appreciate you noticing that. I do try to do that, not only, necessarily, to be just like, “I want to be the positive star in tech,” but also because you never know what someone is dealing with, and someone might be pretty mean, and there have been plenty of people who have said some not great things towards me or towards other people and that cuts deep. And so I do try to avoid those kinds of pointed things. Believe me, it’s difficult; sometimes I do just want to call people out and be just like, “I know what you did to this group of people, and I hate it.” But you never know what people are going through, and I’d rather just make sure that the people who are doing well are the ones who are uplifted, and they get the attention that they need, or deserve, rather.Corey: I did a little research—I know, I know; shock—before I wound up inviting you here, and it’s not just your Twitter account. It’s not just your TikToks, it’s not just your weekly multi-hour livestreaming on Twitch—or ‘Twetch’ or however it’s pronounced. I’m old, and that’s fine—it’s not the platforms; it’s the fact that no matter where you are, you’re constantly teaching people things. And I want to be clear, that doesn’t seem like it’s in your job description, is it?Cassidy: No, but it’s something that I really care about. I really like teaching in general. A lot of the resources that I provide and the things that I do are me trying to give people things that I didn’t have when I was in the industry, trying to give advice that I wish I had, trying to give resources that I didn’t have. Because a lot of times, people don’t know where to look, and if I can be that person that can help them along, some of the greatest joys I’ve ever felt have been when people say, “This blog post that you wrote helped me get my first job,” or, “This thing that you said, was the kick in the pants that I needed to start my own company.” Little things like that. I love hearing it because I really just love making people successful and helping them get to that next step in their careers. And that’s my passion project, and I tried to do that and all the things that I do.Corey: There’s really something to be said about being able to reach people who have pain and have needs. I mean, the one crossover talk that I gave that really transformed the way that I saw things was “Terrible Ideas in Git” because if there’s one thing that unites frontend, backend, ops folks, data scientists, et cetera, et cetera, et cetera, it’s Git as being the common thing that no one really understands. And by teaching people how to use Git, first, it was sort of my backdoor, sneaky hack into finally having to teach myself how Git works. But then it was a problem of where, now I need to go ahead and find a way to present this in a way that’s engaging, and fun, and doesn’t require being deep into the weeds. And I was invited to speak at Frontend Conference, Zurich, which was just a surreal experience.Incredibly nice people, very gracious community and I’m sitting there for the first half of the day watching the talks, and it’s a frontend conference and everyone’s slides are gorgeous. And this was before I started having a designer help me with my slides. So, it was always a black Helvetica text on a white background. And mine looked like crap, and I only had a few hours until my talk, so what do I do because I’m feeling incredibly out of place? I changed the font on everything to Comic Sans and leaned in on that.And it definitely got a reaction. The talk was great. It really did work. And it was fun. And in hindsight, I don’t think I’d do it again because I keep hearing rumors that I can’t quite confirm, but it’s significant enough that I want to be clear, that Comic Sans is apparently super accessible when it comes to people with dyslexia, and I don’t want to crap on something like that. It’s not funny when it makes people feel out of place.Cassidy: Yeah. These kinds of things, it’s delicate to talk about because you have to figure out, okay, how can I make this accessible to as many people as possible? How can I communicate this information? And then, meanwhile, when you are this person, that just means your DMs are very, very full of people who want one-on-one help and you have to figure out how to scale yourself, and how can you make these statements that are helpful for as many people as possible, provide as many resources as you can, and hope that people don’t feel bad when you can’t answer every DM that comes your way. And yeah, there’s a delicacy when it comes to all the different things that you could be poking fun at, or saying you don’t like, and stuff, and my answer to pretty much everything has turned into just, “It depends.”Whenever people are just like, “What’s the best framework to learn?” I’m kind of like, “Eh, it depends on what you want to build.” Because first of all, that’s true, but second of all, there’s enough opinions out there in the world saying, like, “This is the worst font.” “This is the best font.” “This is the worst way to build web apps.” “This is the only way to build web apps.” I mean, you hear this constantly throughout the tech industry. And I think if more people said, “It depends,” we would be a [laugh] much happier industry in general.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: I really think that you’re right, and I think the hardest part is getting there. You say that the answer to, “What framework should I pick?” Is, “Well, it depends.” And that’s very true. The counterargument is that it’s also supremely unhelpful. It’s—Cassidy: Right.Corey: —“I’m looking to build a web page that has a form on it, and when I click a button, it does a thing.” And at that point, it feels like it’s, “Well, there are an entire field of yaks before you, all of them need to be shaved before the form will exist.” And it just becomes this. “Oh, my god, are you just trying to tell me not to bother?” And no, that’s never the response.But having a blessing, I guess, golden path of where you can focus to get something done, and then where it makes sense to deviate gets signaled, I like that approach. But people are for some reason worried about being overly prescriptive. And I get that too.Cassidy: Yeah, there’s a balance there. But I should append to my previous answer. I say, “It depends, but here’s how I would do it.” And that gives some direction. Some people might be just like, “Oh, well, I don’t want to use React,” or something like that, and I’m like, “Well, then, unfortunately, I can’t help you. You’re on your own. But I’m sure it’ll work for you.” And just kind of roll with it from there because you never know.Corey: Yeah, what I’ve never liked the questions that the asker already has an answer they want to hear, and they’re looking for, almost, confirmation bias.Cassidy: Yeah.Corey: Yeah.Cassidy: That’s common.Corey: At that point, why bother? Just say, “This is what I’m thinking about doing. Please tell me it’s not ridiculous.” And if it is, people will generally try and be kinder about it. But we’ll see.Cassidy: Yeah, a lot of times, too—and I hate to say it, but a lot of times, too, people come in with such an arrogant air, and oftentimes, that’s either because they’re insecure about something, or they don’t have a lot of experience in something. But [unfortunately 00:23:27], that’s almost always the case. There have been times on my stream, for example, where someone will say, “If you use this framework, it will solve 99% of your problems.” And I’m kind of like, “Eh, will it though?” And I don’t want to just straight up say you’re wrong, but I kind of have to keep asking questions and try to be one of those teachers where I’m saying, “Okay, I’m going to ask you these questions. Are you sure that this edge case is in that 1%? I think you’re being a little bold here.” And not trying to specifically humble them, and know that they are wrong, but also turn it into a moment where you have to learn that nothing really solves 99% of your problems. [laugh].Corey: And whenever someone says something like that, I always assume conflict of interest somewhere. It’s like, “With this framework you’re suggesting, I don’t know, just so happened to integrate super well with the thing your company does? Huh, how about that?” Whenever someone can’t identify an area that they’re offering is crap in, I assume that they’re, effectively, evangelizing something with almost a religious fervor, and aren’t really people to take overly seriously. I have technologies that I adore, but if I can’t articulate use cases in which they would be wildly inappropriate, then I’m not really being fair, either to the person I’m talking to, honestly, the product itself.Cassidy: Exactly. There’s always cons. Yes, there might be a lot of pros and the pros may outweigh the cons, but you have to be able to speak to those if you’re going to give a credible answer to any sort of recommendation like that.Corey: So, let’s talk about platforms a little bit. You have a newsletter which I’m a fan of, and will of course link in the [show notes 00:25:05]. You stream on Twitch, which is similar to a podcast, only it’s video and it’s live so, unlike here where we can edit heavily if someone winds up breaking down crying, like I tend to every third episode—Cassidy: Yeah, we should cut out those farts earlier, by the way.Corey: Oh, yeah. Oh, we’ve already edited that out.Cassidy: Okay, great. [laugh].Corey: We’re already set. We do this in real-time here. But you have to do things like that in real-time on Twitch; as soon as something happens on camera, it’s done, it’s out there, and it’s a very different experience. You do it also on hard mode, where you and I are having a conversation back and forth, whereas when you do Twitch, you’re doing it solo. You are effectively in an empty room—or what appears to be one anyway—and you’re talking to the camera, and there’s no other audio other than you and a lovely backing track.There’s no conversation, you are monologuing for the duration of that. People mention things in the chat with a slight delay, and then you can take action based upon that. But that feels like an awful lot of pressure to wind up filling the dead air while you’re waiting for the next question to come in.Cassidy: Yeah, it’s something that has taken practice. And I think it’s something that because I have done quite a bit of public speaking, I’ve done a bunch of teaching, I am comfortable with the silence. And the music also helps that a lot. Some people when they are about to livestream, or they’re learning how to livestream for the first time they kind of panic at the silence. They’re like, “Oh, my gosh, how am I going to fill it?”Meanwhile, with me, I’m just like, “Ah, nobody’s asking a question. I can take a drink of water now.” And try to keep it as natural as possible. I try to make this stream—I started doing it more regularly during the pandemic, as something that’s kind of just co-working and kind of having something in the background, because usually when people are in the office or working at a cafe or something, you get to hear interesting conversations, and a voice, and you can chime in on occasion. And I try to make that what the stream is where people don’t have to be paying excessive attention, but I open it up where you can ask me pretty much anything and I will give you an honest answer, and just try to make it a space where people can not worry about asking a stupid question because I think that none of these questions, whether it’s about tech jobs, or certain frameworks, or opinions about things, none of them are dumb.Sometimes it’s just people who aren’t sure what the answer should be, or they aren’t sure if their biases are correct or anything like that. And I really enjoy the livestream because it gives me a connection with the community that I can help teach further. And then as they ask questions, I can take that and run with it, and build a demo, help them come up with project ideas, show how I would build something, something like that.Corey: Oh, there’s an incredible authenticity to what you do, and that is, I think, one of the most impressive aspects of what you do. I’ve never yet seen you make someone feel like a jerk for asking a question. I’ve also never once seen you claim you knew how something worked when you didn’t. You point people at resources to find the right answer. You are constantly gracious, you’re always incredibly authentic, and it’s become really easy to consume your materials because I know you’re not going to make it up if you don’t know the answer. And that’s no small thing.Cassidy: Thank you. [laugh]. I appreciate that. It’s not easy, but it’s very fun. And I do hope that it makes people more comfortable with the concept of streaming, coding, and any of that.Corey: You also seem to have some of the same problems than I do, specifically—not the jerk problems. That’s unique to me—but the problem in the context of answering a difficult question, namely, “So, what is it you do?” Because as mentioned, you have the newsletter, you have the job, you have the Twitch stream, you have the TikTok, you have the Twitter. You do courses from time to time, if I’m not mistaken, as well?Cassidy: Yes, I do. I have a few online courses on Scrimba on Udemy on Skillshare on O’Reilly. I like teaching JavaScript and showing people how React works, and stuff, under the hood. And you’re right, it’s hard to explain what I do sometimes. [laugh].Corey: And that’s the hard part is when someone asks, “So, what do you do?” What’s your default answer?Cassidy: I have created this tagline that I’m kind of just sticking with, and we’ll see how long it lasts me. But I say, “I make memes, streams, and software.” And I just kind of leave it at that, and people be like, “Okay, Cassidy, shut up.” [laugh] and I leave it at that. But yeah, if someone asks me what I do, I kind of start with, “I code.”And then if they press further, I’ll be like, “Well, I teach people how to code, and I show people how to code best.” And usually, that’s where my grandpa stops asking. He’s just like, “Okay, it’s that computer stuff.” If it’s a tech person, I start diving more and more into all of the things, and it’s very hard to explain. I wish there were a word for trying to make people laugh, and teach, and build things, and stuff, but I don’t know what that word would be.Corey: Yeah, it’s a hard problem. My answer has always been to spin it depending upon who I’m talking to.Cassidy: True.Corey: If it’s at a neighborhood barbecue and people ask what I do, I try and make myself sound like some sort of esoteric accountant because if I say even slightly incorrectly what I do, suddenly people are asking me about their printers. And honestly, how do I fix a printer? I throw it away and I buy a new one, but that’s not really helpful to people who are looking for actual help. So, it’s a matter of aligning what I do with people’s expectations. “I make fun of Amazon for a living,” is technically accurate, but boy does that get some strange looks.Cassidy: [laugh]. Yeah, it definitely, definitely varies on the audience. If I’m, for example, going to some kind of church barbecue, I just say, oh, I’m a software engineer. Questions stop there, and I leave it at that. If I’m at a tech meetup, I’ll be just like, “Oh, well, I specialize on frontend things, but I also do some dev advocacy and stuff.”And I can generally stop there. But you’re right, depending on the audience, I have to be careful because I don’t want people to just ask me to fix their WiFi all the time, even though they do anyway. And to them. I usually say oh, I build computer things. I don’t know how to work them, though. And I leave it at that.Corey: Oh, hey, I’m building a computer, too. Can you recommend some parts? Absolutely not. Is my—Cassidy: Nope.Corey: —I don’t know what I’m doing there.Cassidy: I kind of just Google and accept whatever I’m told. [laugh].Corey: Yeah. And the other side of it, too, is if you’re not direct enough and say, “Working with technology,” people tend to think that you’re being condescending. It’s like, “Oh, I do some cloud computing finance work.” And they’re like, “Oh, so what, you fix an AWS bill?” Yeah, exactly. “You could just say that, you know?” “Well, yeah. To you, but there’s a whole world of people out there to whom that sounds like I’m blowing them off with geekspeak.”Cassidy: Yeah. Yeah, exactly. And it’s almost harder if it’s a mixed group of people, too, because sometimes people who are in tech but I don’t know the rest of the people, they might say, “Oh, she makes tech jokes on Twitter.” And they’ll say, “Oh, really? Say something funny.” I’m like, “Uh—I don’t know how.” [laugh]. It’s not that easy. It’s interesting trying to figure out how you define that for other people.Corey: “Oh, you’re a comedian. Great. Make me laugh.” Like, “Oh, God.”Cassidy: Just please, no.Corey: Yeah, that’s the best setup for a good belly laugh is command performance of, be funny when you weren’t expecting it?Cassidy: Yeah. Ugh, can’t handle it. I just freeze up and give up.Corey: Ugh. Again, these are not common problems. One thing that I did find incredibly funny was that when we started talking, we talked about things that we had encountered as we wound up going through expanding audiences on Twitter and whatnot. And you sent a screenshot, at one point, of tracking your Twitter follower count over time in a private Slack channel that you had. And you said, “This is ridiculous, and no one ever does it.” And then I responded with a screenshot of me doing the exact same thing, which is—Cassidy: So funny.Corey: —first, hilarious because I’ve never seen anyone else do that. And, two, a bit of product feedback, perhaps, for the team at Twitter.Cassidy: It really is. Yeah, no, when I found out you did that, too, I laughed so hard because so many times people have been just like, “You know there’s tools for this? You don’t have to just write a number in DM to yourself on Slack.” But this is the tool that works for me. It’s quick. It’s done. I can see, generally, how things are going. Someday I should put it in a graph of some type, but eh.Corey: But it’s always forward-looking, too, because all those tools don’t go back in time to your account’s inception. And, “Oh, you had this person follow you at this time.” There’s no historical record there.Cassidy: Yeah. It is totally product feedback. I have no idea how I’d be able to say, “Hey, look at this DM, fix this problem,” to a specific Twitter person, but, eh.Corey: Four years ago, I had 1500 Twitter followers and it had taken me seven years to do it. And people ask, “What were the big inflection points when you wound up getting significant audience boosts?” And if I had dates on that stuff, I could absolutely do some correlation like, “Oh, there’s re:Invent.” “Oh, that’s where I was visibly thrown out of a bar on the news.” Kidding. But being able to tie it to things like that would be helpful, but it’s happened, it’s gone. I just have to basically try and remember, and assume I’m somewhat close to accurate.Cassidy: Yeah. And I don’t do it consistently, mind you, there’s definitely weeks where I just totally miss it. But sometimes, for example, if I’m about to tweet something funny, I’ll mark it and then make the post and just see where it goes. And it’s more just interesting for me; I will probably never share this with people, besides you when we talk about our [laugh] strategies. But yeah, I mean, I guess that also speaks to building what’s best for you is often the best solution.Corey: Yeah, and it changes, too. And the part of the reason that these conversations tend to happen behind closed doors because the easy, naive response is, “Oh, that’d be super interesting to watch and see how those problems get addressed.” But so much of what we’re doing and how we approach it is not helpful until you’re at a certain point of scale. If you have 200 Twitter followers, for example, frankly, you’re making better life choices and either one of us are, but the things that we are concerned with and have to pay attention to, just don’t apply in any meaningful way.Cassidy: Right.Corey: Conversely, if you have a small following Twitter account, that is a freedom that we don’t really have because past a certain point, as I’m sure you can attest, you can’t say that you like waffles without getting someone asking, “Well, what’s the problem you have with pancakes?” And then insulting you and following you around until you block them.Cassidy: It’s so true. I was talking with someone about this yesterday because it’s not like I ever say things that are particularly controversial or anything, but word choice matters so much more when there are a lot of eyes on you. And so many times I’ll make a joke, and then I have to do a follow-up tweet saying, “This is a joke. Please don’t tell me how to exit vim.” Or something like that. Because oh, my word. People just will never take things the right way en masse.Corey: No, I have learned there’s no possible way to say something without it being misinterpreted. And I try and wind up turning it back around, and every time I read something, I do my best to assume good faith. I don’t always succeed, and sometimes I look like a fool for basically taking a troll seriously, but I’d rather that than the alternative of someone asks a naive question, and I assume they’re just being a jerk and block them or I mock them. Because the failure mode of me looking like I got hoodwinked is better than making someone else feel crappy.Cassidy: Right. Exactly. I remember a while ago, this was, like, a couple years ago, there was someone who was not being nice to me in the mentions, and I was just like, “Why would you respond to me like this? Just leave me alone.” I said something like that.And it was a lesson for me and for them, where they ended up getting really upset with me and yelling at me in the DMs because they were getting all of this negative commentary on there and for being the mean one, and then I end up looking like a jerk because I ended up spotlighting this person who might have been having a bad day. You never know. And the algorithm works against you when you have a lot of eyes who are looking at what you’re tweeting about. And so, yeah, you have whenever stuff like that happens, you kind of just have to ignore it and learn to pick your battles, I guess.Corey: Oh, yeah. And I assume that’s going on now. I imagine that one day, the AWS Twitter account is going to finally snap and just quote-tweet me with some incredible roast and there will be no coming back from that for me. I look forward to that day. It would be so nice to see that come out of them. I worry, I may die before it gets there, but hope springs eternal.Cassidy: [laugh].Corey: Cassidy, thank you so much for taking the time to speak with me. If \people want to hear more about what you have to say—as they damn well should—okay can they find you? Take a deep breath; run through the list.Cassidy: All right, they can find me on all sorts of platforms. You could look up Cassidy Williams, and you’ll find either me or a Scooby-Doo character, and I’m not the Scooby-Doo character. Or you could look up cassidoo—C-A-S-S-I-D-O-O—cassidoo.co is my website, cassido on Twitter on GitHub on CodePen on LinkedIn all those platforms. That’s where you can find me.Corey: And we will put links to all of those things in the [show notes 00:38:03] because honestly, that’s someone else’s job, and I am going to hurl that mess to them.Cassidy: [laugh]. Perfect.Corey: Thank you so much for taking the time to speak with me. I really appreciate that.Cassidy: It was really fun. It was good chatting with you, too.Corey: It really was. Cassidy Williams, principal developer experience engineer at Netlify. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an aggressive comment encouraging me to fight you on Twitch, however that might work.Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.This has been a HumblePod production. Stay humble.
About NickNick Frichette is a Penetration Tester and Team Lead for State Farm. Outside of work he does vulnerability research. His current primary focus is developing techniques for AWS exploitation. Additionally he is the founder of hackingthe.cloud which is an open source encyclopedia of the attacks and techniques you can perform in cloud environments.Links: Hacking the Cloud: https://hackingthe.cloud/ Determine the account ID that owned an S3 bucket vulnerability: https://hackingthe.cloud/aws/enumeration/account_id_from_s3_bucket/ Twitter: https://twitter.com/frichette_n Personal website:https://frichetten.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I spend a lot of time throwing things at AWS in varying capacities. One area I don’t spend a lot of time giving them grief is in the InfoSec world because as it turns out, they—and almost everyone else—doesn’t have much of a sense of humor around things like security. My guest today is Nick Frechette, who’s a penetration tester and team lead for State Farm. Nick, thanks for joining me.Nick: Hey, thank you for inviting me on.Corey: So, like most folks in InfoSec, you tend to have a bunch of different, I guess, titles or roles that hang on signs around someone’s neck. And it all sort of distills down, on some level—in your case, at least, and please correct me if I’m wrong—to ‘cloud security researcher.’ Is that roughly correct? Or am I missing something fundamental?Nick: Yeah. So, for my day job, I do penetration testing, and that kind of puts me up against a variety of things, from web applications, to client-side applications, to sometimes the cloud. In my free time, though, I like to spend a lot of time on security research, and most recently been focusing pretty heavily on AWS.Corey: So, let’s start at the very beginning. What is a cloud security researcher? “What is it you’d say it is you do here?” For lack of a better phrasing?Nick: Well, to be honest, the phrase ‘security researcher’ or ‘cloud security researcher’ has been, kind of… I guess watered down in recent years; everybody likes to call themselves a researcher in some way or another. You have some folks who participate in the bug bounty programs. So, for example, GCP, and Azure have their own bug bounties. AWS does not, and too sure why. And so they want to find vulnerabilities with the intention of getting cash compensation for it.You have other folks who are interested in doing security research to try and better improve defenses and alerting and monitoring so that when the next major breach happens, they’re prepared or they’ll be able to stop it ahead of time. From what I do, I’m very interested in offensive security research. So, how can I as, a penetration tester, or red teamer or, I guess, an actual criminal, [laugh] how can I take advantage of AWS, or try to avoid detection from services like GuardDuty and CloudTrail?Corey: So, let’s break that down a little bit further. I’ve heard the term of ‘red team versus blue team’ used before. Red team—presumably—is the offensive security folks—and yes, some of those people are, in fact, quite offensive—and blue team is the defense side. In other words, keeping folks out. Is that a reasonable summation of the state of the world?Nick: It can be, yeah, especially when it comes to security. One of the nice parts about the whole InfoSec field—I know a lot of folks tend to kind of just say, “Oh, they’re there to prevent the next breach,” but in reality, InfoSec has a ton of different niches and different job specialties. “Blue teamers,” quote-unquote, tend to be the defense side working on ensuring that we can alert and monitor potential attacks, whereas red teamers—or penetration testers—tend to be the folks who are trying to do the actual exploitation or develop techniques to do that in the future.Corey: So, you talk a bit about what you do for work, obviously, but what really drew my notice was stuff you do that isn’t part of your core job, as best I understand it. You’re focused on vulnerability research, specifically with a strong emphasis on cloud exploitation, as you said—AWS in particular—and you’re the founder of Hacking the Cloud, which is an open-source encyclopedia of various attacks and techniques you can perform in cloud environments. Tell me about that.Nick: Yeah, so Hacking the Cloud came out of a frustration I had when I was first getting into AWS, that there didn’t seem to be a ton of good resources for offensive security professionals to get engaged in the cloud. By comparison, if you wanted to learn about web application hacking, or attacking Active Directory, or reverse engineering, if you have a credit card, I can point you in the right direction. But there just didn’t seem to be a good course or introduction to how you, as a penetration tester, should attack AWS. There’s things like, you know, open S3 buckets are a nightmare, or that server-side request forgery on an EC2 instance can result in your organization being fined very, very heavily. I kind of wanted to go deeper with that.And with Hacking the Cloud, I’ve tried to gather a bunch of offensive security research from various blog posts and conference talks into a single location, so that both the offense side and the defense side can kind of learn from it and leverage that to either improve defenses or look for things that they can attack.Corey: It seems to me that doing things like that is not likely to wind up making a whole heck of a lot of friends over on the cloud provider side. Can you talk a little bit about how what you do is perceived by the companies you’re focusing on?Nick: Yeah. So, in terms of relationship, I don’t really have too much of an idea of what they think. I have done some research and written on my blog, as well as published to Hacking the Cloud, some techniques for doing things like abusing the SSM agent, as well as abusing the AWS API to enumerate permissions without logging into CloudTrail. And ironically, through the power of IP addresses, I can see when folks from the Amazon corporate IP address space look at my blog, and that’s always fun, especially when there’s, like, four in the course of a couple of minutes, or five or six. But I don’t really know too much about what they—or how they view it, or if they think it’s valuable at all. I hope they do, but really not too sure.Corey: I would imagine that they do, on some level, but I guess the big question is, you know that someone doesn’t like what you’re doing when they send, you know, cease and desist notices, or have the police knock on your door. I feel like at most levels, we’re past that in an InfoSec level, at least I’d like to believe we are. We don’t hear about that happening all too often anymore. But what’s your take on it?Nick: Yeah, I definitely agree. I definitely think we are beyond that. Most companies these days know that vulnerabilities are going to happen, no matter how hard you try and how much money you spend, and so it’s better to be accepting of that and open to it. And especially because the InfoSec community can be so, say, noisy at times, it’s definitely worth it to pay attention, definitely be appreciative of the information that may come out. AWS is pretty awesome to work with, having disclosed to them a couple times, now.They have a safe harbor provision, which essentially says that so long as you’re operating in good faith, you are allowed to do security testing. They do have some rules around that, but they are pretty clear in terms of if you were operating in good faith, you wouldn’t be doing anything like that. It tends to be pretty obviously malicious things that they’ll ask you to stop.Corey: So, talk to me a little bit about what you’ve found lately, and been public about. There have been a number of examples that have come up whenever people start googling your name or looking at things you’ve done. But what’s happening lately? What have you found that’s interesting?Nick: Yeah. So, I think most recently, the thing that’s kind of gotten the most attention has been a really interesting bug I found in the AWS API. Essentially, kind of the core of it is that when you are interacting with the API, obviously that gets logged to CloudTrail, so long as it’s compatible. So, if you are successful, say you want to do, like, Secrets Manager, ListSecrets, that shows up in CloudTrail. And similarly, if you do not have that permission on a role or user and you try to do it, that access denied also gets logged to CloudTrail.Something kind of interesting that I found is that by manually modifying a request, or mal-forming them, what we can do is we can modify the content-type header, and as a result when you do that—and you can provide literally gibberish. I think I have VS Code window here somewhere with a content-type of ‘meow’—when you do that, the AWS API knows the action that you’re trying to call because of that messed up content type, it doesn’t know exactly what you’re trying to do and as a result, it doesn’t get logged to CloudTrail. Now, while that may seem kind of weirdly specific and not really, like, a concern, the nice part of it though is that for some API actions—somewhere in the neighborhood of 600. I say ‘in the neighborhood of’ just because it fluctuates over time—as a result of that, you can tell if you have that permission, or if you don’t without that being logged to CloudTrail. And so we can do this enumeration of permissions without somebody in the defense side seeing us do it. Which is pretty awesome from a offensive security perspective.Corey: On some level, it would be easy to say, “Well, just not showing up in the logs isn’t really a security problem at all.” I guess that you disagree?Nick: I do, yeah. So, let’s sort of look at it from a real-world perspective. Let’s say, Corey, you’re tired of saving people money on their AWS bill, you’d instead maybe want to make a little money on the side and you’re okay with perhaps, you know, committing some crimes to do it. Through some means you get access to a company’s AWS credentials for some particular role, whether that’s through remote code execution on an EC2 instance, or maybe find them in an open location like an S3 bucket or a Git repository, or maybe you phish a developer, through some means, you have an access key and a secret access key. The new problem that you have is that you don’t know what those credentials are associated with, or what permissions they have.They could be the root account keys, or they could be literally locked down to a single S3 bucket to read from. It all just kind of depends. Now, historically, your options for figuring that out are kind of limited. Your best bet would be to brute-force the AWS API using a tool like Pacu, or my personal favorite, which is enumerate-iam by Andres Riancho. And what that does is it just tries a bunch of API calls and sees which one works and which one doesn’t.And if it works, you clearly know that you have that permission. Now, the problem with that, though, is that if you were to do that, that’s going to light up CloudTrail like a Christmas tree. It’s going to start showing all these access denieds for these various API calls that you’ve tried. And obviously, any defender who’s paying attention is going to look at that and go, “Okay. That’s, uh, that’s suspicious,” and you’re going to get shut down pretty quickly.What’s nice about this bug that I found is that instead of having to litter CloudTrail with all these logs, we can just do this enumeration for roughly 600-ish API actions across roughly 40 AWS services, and nobody is the wiser. You can enumerate those permissions, and if they work fantastic, and you can then use them, and if you come to find you don’t have any of those 600 permissions, okay, then you can decide on where to go from there, or maybe try to risk things showing up in CloudTrail.Corey: CloudTrail is one of those services that I find incredibly useful, or at least I do in theory. In practice, it seems that things don’t show up there, and you don’t realize that those types of activities are not being recorded until one day there’s an announcement of, “Hey, that type of activity is now recorded.” As of the time of this recording, the most recent example that in memory is data plane requests to DynamoDB. It’s, “Wait a minute. You mean that wasn’t being recorded previously? Huh. I guess it makes sense, but oh, dear.”And that causes a reevaluation of what’s happening in the—from a security policy and posture perspective for some clients. There’s also, of course, the challenge of CloudTrail logs take a significant amount of time to show up. It used to be over 20 minutes, I believe now it’s closer to 15—but don’t quote me on that, obviously. Run your own tests—which seems awfully slow for anything that’s going to be looking at those in an automated fashion and taking a reactive or remediation approach to things that show up there. Am I missing something key?Nick: No, I think that is pretty spot on. And believe me, [laugh] I am fully aware at how long CloudTrail takes to populate, especially with doing a bunch of research on what is and what is not logged to CloudTrail. I know that there are some operations that can be logged more quickly than the 15-minute average. Off the top of my head, though, I actually don’t quite remember what those are. But you’re right, in general, the majority at least do take quite a while.And that’s definitely time in which an adversary or someone like me, could maybe take advantage of that 15-minute window to try and brute force those permissions, see what we have access to, and then try to operate and get out with whatever goodies we’ve managed to steal.Corey: Let’s say that you’re doing the thing that you do, however that comes to be—and I am curious—actually, we’ll start there. I am curious; how do you discover these things? Is it looking at what is presented and then figuring out, “Huh, how can I wind up subverting the system it’s based on?” And, similar to the way that I take a look at any random AWS services and try and figure out how to use it as a database? How do you find these things?Nick: Yeah, so to be honest, it all kind of depends. Sometimes it’s completely by accident. So, for example, the API bug I described about not logging to CloudTrail, I actually found that due to [laugh] copy and pasting code from AWS’s website, and I didn’t change the content-type header. And as a result, I happened to notice this weird behavior, and kind of took advantage of it. Other times, it’s thinking a little bit about how something is implemented and the security ramifications of it.So, for example, the SSM agent—which is a phenomenal tool in order to do remote access on your EC2 instances—I was sitting there one day and just kind of thought, “Hey, how does that authenticate exactly? And what can I do with it?” Sure enough, it authenticates the exact same way that the AWS API does, that being the metadata service on the EC2 instance. And so what I figured out pretty quickly is if you can get access to an EC2 instance, even as a low-privilege user or you can do server-side request forgery to get the keys, or if you just have sufficient permissions within the account, you can potentially intercept SSM messages from, like, a session and provide your own results. And so in effect, if you’ve compromised an EC2 instance, and the only way, say, incident response has into that box is SSM, you can effectively lock them out of it and, kind of, do whatever you want in the meantime.Corey: That seems like it’s something of a problem.Nick: It definitely can be. But it is a lot of fun to play keep-away with incident response. [laugh].Corey: I’d like to reiterate that this is all in environments you control and have permissions to be operating within. It is not recommended that people pursue things like this in other people’s cloud environments without permissions. I don’t want to find us sued for giving crap advice, and I don’t want to find listeners getting arrested because they didn’t understand the nuances of what we’re talking about.Nick: Yes, absolutely. Getting legal approval is really important for any kind of penetration testing or red teaming. I know some folks sometimes might get carried away, but definitely be sure to get approval before you do any kind of testing.Corey: So, how does someone report a vulnerability to a company like AWS?Nick: So AWS, at least publicly, doesn’t have any kind of bug bounty program. But what they do have is a vulnerability disclosure program. And that is essentially an email address that you can contact and send information to, and that’ll act as your point of contact with AWS while they investigate the issue. And at the end of their investigation, they can report back with their findings, whether they agree with you and they are working to get that patched or fixed immediately, or if they disagree with you and think that everything is hunky-dory, or if you may be mistaken.Corey: I saw a tweet the other day that I would love to get your thoughts on, which said effectively, that if you don’t have a public bug bounty program, then any way that a researcher chooses to disclose the vulnerability is definitionally responsible on their part because they don’t owe you any particular duty of care. Responsible disclosure, of course, is also referred to as, “Coordinated vulnerability disclosure” because we’re always trying to reinvent terminology in this space. What do you think about that? Is there a duty of care from security researchers to responsibly disclose the vulnerabilities they find, or coordinate those vulnerabilities with vendors in the absence of a public bounty program on turning those things in?Nick: Yeah, you know, I think that’s a really difficult question to answer. From my own personal perspective, I always think it’s best to contact the developers, or the company, or whoever maintains whatever you found a vulnerability in, give them the best shot to have it fixed or repaired. Obviously, sometimes that works great, and the company is super receptive, and they’re willing to patch it immediately. And other times, they just don’t respond, or sometimes they respond harshly, and so depending on the situation, it may be better for you to release it publicly with the intention that you’re informing folks that this particular company or this particular project may have an issue. On the flip side, I can kind of understand—although I don’t necessarily condone it—why folks pursue things like exploit brokers, for example.So, if a company doesn’t have a bug bounty program, and the researcher isn’t expecting any kind of, like, cash compensation, I can understand why they may spend tens of hours, maybe hundreds of hours chasing down a particularly impactful vulnerability, only to maybe write a blog post about it or get a little head pat and say, “Thanks, nice work.” And so I can see why they may pursue things like selling to an exploit broker who may pay them hefty sum, if it is a—Corey: Orders of magnitude more. It’s, “Oh, good. You found a way to remotely execute code across all of EC2 in every region”—that is a hypothetical; don’t email me—have a t-shirt. It seems like you could basically buy all the t-shirts for [laugh] what that is worth on the export market.Nick: Yes, absolutely. And I do know from some experience that folks will reach out to you and are interested in, particularly, some cloud exploits. Nothing, like, minor, like some of the things that I’ve found, but more thinking more of, like, accessing resources without anybody knowing or accessing resources cross-account; that could go for quite a hefty sum.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: It always feels squicky, on some level, to discover something like this that’s kind of neat, and wind up selling it to basically some arguably terrible people. Maybe. We don’t know who’s buying these things from the exploit broker. Counterpoint, having reported a few security problems myself to various providers, you get an autoresponder, then you get a thank you email that goes into a bit more detail—for the well-run programs, at least—and invariably, the company’s position is, is whatever you found is not as big of a deal as you think it is, and therefore they see no reason to publish it or go loud with it. Wouldn’t you agree?Because, on some level, their entire position is, please don’t talk about any security shortcomings that you may have discovered in our system. And I get why they don’t want that going loud, but by the same token, security researchers need a reputation to continue operating on some level in the market as security researchers, especially independents, especially people who are trying to make names for themselves in the first place.Nick: Yeah.Corey: How do you resolve that dichotomy yourself?Nick: Yeah, so, from my perspective, I totally understand why a company or project wouldn’t want you to publicly disclose an issue. Everybody wants to look good, and nobody wants to be called out for any kind of issue that may have been unintentionally introduced. I think the thing at the end of the day, though, from my perspective, if I, as some random guy in the middle of nowhere Illinois finds a bug, or to be frank, if anybody out there finds a vulnerability in something, then a much more sophisticated adversary is equally capable of finding such a thing. And so it’s better to have these things out in the open and discussed, rather than hidden away, so that we have the best chance of anybody being able to defend against it or develop detections for it, rather than just kind of being like, “Okay, the vendor didn’t like what I had to say, I guess I’ll go back to doing whatever [laugh] things I normally do.”Corey: You’ve obviously been doing this for a while. And I’m going to guess that your entire security researcher career has not been focused on cloud environments in general and AWS in particular.Nick: Yes, I’ve done some other stuff in relation to abusing GitLab Runners. I also happen to find a pretty neat RCE and privilege escalation in the very popular open-source project. Pi-hole. Not sure if you have any experience with that.Corey: Oh, I run it myself all the time for various DNS blocking purposes and other sundry bits of nonsense. Oh, yes, good. But what I’m trying to establish here is that this is not just one or two companies that you’ve worked with. You’ve done this across the board, which means I can ask a question without naming and shaming anyone, even implicitly. What differentiates good vulnerability disclosure programs from terrible ones?Nick: Yeah, I think the major differentiator is the reactivity of the project, as in how quickly they respond to you. There are some programs I’ve worked with where you disclose something, maybe even that might be of a high severity, and you might not hear back four weeks at a time, whereas there are other programs, particularly the MSRC—which is a part of Microsoft—or with AWS’s disclosure program, where within the hour, I had a receipt of, “Hey, we received this, we’re looking into it.” And then within a couple hours after that, “Yep, we verified it. We see what you’re seeing, and we’re going to look at it right away.” I think that’s definitely one of the major differentiators for programs.Corey: Are there any companies you’d like to call out in either direction—and, “No,” is a perfectly valid [laugh] answer to this one—for having excellent disclosure programs versus terrible ones?Nick: I don’t know if I’d like to call anybody out negatively. But in support, I have definitely appreciated working with both AWS’s and the MSRC—Microsoft’s—I think both of them have done a pretty fantastic job. And they definitely know what they’re doing at this point.Corey: Yeah, I must say that I primarily focus on AWS and have for a while, which should be blindingly obvious to anyone who’s listened to me talk about computers for more than three and a half minutes. But my experiences with the security folks at AWS have been uniformly positive, even when I find things that they don’t want me talking about, that I will be talking about regardless, they’ve always been extremely respectful, and I have never walked away from the conversation thinking that I was somehow cheated by the experience. In fact, a couple of years ago at the last in-person re:Invent, I got to give a talk around something I reported specifically about how AWS runs its vulnerability disclosure program with one of their security engineers, Zach Glick, and he was phenomenally transparent around how a lot of these things work, and what they care about, and how they view these things, and what their incentives are. And obviously being empathetic to people reporting things in with the understanding that there is no duty of care that when security researchers discover something, they then must immediately go and report it in return for a pat on the head and a thank you. It was really neat being able to see both sides simultaneously around a particular issue. I’d recommend it to other folks, except I don’t know how you make that lightning strike twice.Nick: It’s very, very wise. Yes.Corey: Thank you. I do my best. So, what’s next for you? You’ve obviously found a number of interesting vulnerabilities around information disclosure. One of the more recent things that I found that was sort of neat as I trolled the internet—I don’t believe it was yours, but there was a ability to determine the account ID that owned an S3 bucket by enumerating by a binary search. Did you catch that at all?Nick: I did. That was by Ben Bridts, which is—it’s pretty awesome technique, and that’s been something I’ve been kind of interested in for a while. There is an ability to enumerate users’ roles and service-linked roles inside an account, so long as the account ID. The problem, of course, is getting the account ID. So, when Ben put that out there I was super stoked about being able to leverage that now for enumeration and maybe some fun phishing tricks with that.Corey: I love the idea. I love seeing that sort of thing being conducted. And AWS’s official policy as best I remember when I looked at this once, account IDs are not considered confidential. Do you agree with that?Nick: Yep. That is my understanding of how AWS views it. From my perspective, having an account ID can be beneficial. I mentioned that you can enumerate users’ roles and service-linked roles with it, and that can be super useful from a phishing perspective. The average phishing email looks like, “Oh, you won an iPad,” or, “Oh, you’re the 100th visitor of some website,” or something like that.But imagine getting an email that looks like it’s from something like AWS developer support, or from some research program that they’re doing, and they can say to you, like, “Hey, we see that you have these roles in your account with account ID such-and-such, and we know that you’re using EKS, and you’re using ECS,” that phishing email becomes a lot more believable when suddenly this outside party seemingly knows so much about your account. And that might be something that you would think, “Oh, well only a real AWS employee or AWS would know that.” So, from my perspective, I think it’s best to try and keep your account ID secret. I actually redact it from every screenshot that I publish, or at the very least, I try to. At the same time, though, it’s not the kind of thing that’s going to get somebody in your account in a single step, so I can totally see why some folks aren’t too concerned about it.Corey: I feel like we also got a bit of a red herring coming from AWS blog posts themselves, where they always will give screenshots explaining what they do, and redact the account ID in every case. And the reason that I was told at one point was, “Oh, we have an internal provisioning system that’s different. It looks different, and I don’t want to confuse people whenever I wind up doing a screenshot.” And that’s great, and I appreciate that. And part of me wonders on one level how accurate is that?Because sure, I understand that you don’t necessarily want to distract people with something that looks different, but then I found out that the system is called Isengard and, yeah, it’s great. They’ve mentioned it periodically in blog posts, and talks, and the rest. And part of me now wonders, oh, wait a minute. Is it actually because they don’t want to disclose the differences between those systems, or is it because they don’t have license rights publicly to use the word Isengard and don’t want to get sued by whoever owns the rights to the Lord of the Rings trilogy. So, one wonders what the real incentives are in different cases. But I’ve always viewed account IDs as being the sort of thing that eh, you probably want to share them around all the time, but it also doesn’t necessarily hurt.Nick: Exactly, yeah. It’s not the kind of thing you want to share with the world immediately, but it doesn’t really hurt in the end.Corey: There was an early time when the partner network was effectively determining tiers of partner by how much spend they influenced, and the way that you’ve demonstrated that was by giving account IDs for your client accounts. The only verification at the time, to my understanding was that, “Yep, that mapped to the client you said it did.” And that was it. So, I can understand back in those days not wanting to muddy those waters. But those days are also long passed.So, I get it. I’m not going to be the first person to advertise mine, but if you can discover my account ID by looking at a bucket, it doesn’t really keep me up at night.So, all of those things considered, we’ve had a pretty wide-ranging conversation here about a variety of things. What’s next? What interests you as far as where you’re going to start looking and exploring—and exploiting as the case may be—various cloud services? hackthe.cloud—which there is the dot in there, which also turns it into a domain; excellent choice—is absolutely going to be a great collection for a lot of what you find and for other people to contribute and learn from one another. But where are you aimed at? What’s next?Nick: Yeah, so one thing I’ve been really interested in has been fuzzing the AWS API. As anyone who’s ever used AWS before knows, there are hundreds of services with thousands of potential API endpoints. And so from a fuzzing perspective, there is a wide variety of things for us to potentially affect or potentially find vulnerabilities in. I’m currently working on a library that will allow me to make that fuzzing a lot easier. You could use things like botocore, Boto3, like, some of the AWS SDKs.The problem though, is that those are designed for, sort of like, the happy path where you can format your request the way Amazon wants. As a security researcher or as someone doing fuzzing, I kind of want to send random gibberish sometimes, or I want to malform my requests. And so that library is still in production, but it has already resulted in a bug. While I was fuzzing part of the AWS API, I happened to notice that I broke Elastic Beanstalk—quite literally—when [laugh] when I was going through the AWS console, I got the big red error message of, “[unintelligible 00:29:35] that request parameter is null.” And I was like, “Huh. Well, why is it null?”And come to find out as a result of that, there is a HTML injection vulnerability in the Elastic—well, there was a HTML injection vulnerability in the Elastic Beanstalk, for the AWS console. Pivoting from there, the Elastic Beanstalk uses Angular 1.8.1, or at least it did when I found it. As a result of that, we can modify that HTML injection to do template injection. And for the AngularJS crowd, template injection is basically cross-site scripting [laugh] because there is no sandbox anymore, at least in that version. And so as a result of that, I was able to get cross-site scripting in the AWS console, which is pretty exciting. That doesn’t tend to happen too frequently.Corey: No that is not a typical issue that winds up getting disclosed very often.Nick: Definitely, yeah. And so I was excited about it, and considering the fact that my library for fuzzing is literally, like, not even halfway done, or is barely halfway done, I’m looking forward to what other things I can find with it.Corey: I look forward to reading more. And at the time of this recording, I should point out that this has not been finalized or made public, so I’ll be keeping my eyes open to see what happens with this. And hopefully, this will be old news by the time this episode drops. If not, well, [laugh] this might be an interesting episode once it goes out.Nick: Yeah. I hope they’d have it fixed by then. They haven’t responded to it yet other than the, “Hi, we’ve received your email. Thanks for checking in.” But we’ll see how that goes.Corey: Watching news as it breaks is always exciting. If people want to learn more about what you’re up to, and how you go about things, where can they find you?Nick: Yeah, so you can find me at a couple different places. On Twitter I’m @frichette_n. I also write a blog where I contribute a lot of my research at frechetten.com as well as Hacking the Cloud. I contribute a lot of the AWS stuff that gets thrown on there. And it’s also open-source, so if anyone else would like to contribute or share their knowledge, you’re absolutely welcome to do so. Pull requests are open and excited for anyone to contribute.Corey: Excellent. And we will of course include links to that in the [show notes 00:31:42]. Thank you so much for taking the time to speak with me. I really appreciate it.Nick: Yeah, thank you so much for inviting me on. I had a great time.Corey: Nick Frechette, penetration tester and team lead for State Farm. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment telling me why none of these things are actually vulnerabilities, but simultaneously should not be discussed in public, ever.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About Christina Christina Maslach, PhD, is a Professor of Psychology (Emerita) and a researcher at the Healthy Workplaces Center at the University of California, Berkeley.  She received her A.B. from Harvard, and her Ph.D. from Stanford.  She is best known as the pioneering researcher on job burnout, producing the standard assessment tool (the Maslach Burnout Inventory, MBI), books, and award-winning articles.  The impact of her work is reflected by the official recognition of burnout, as an occupational phenomenon with health consequences, by the World Health Organization in 2019.  In 2020, she received the award for Scientific Reviewing, for her writing on burnout, from the National Academy of Sciences.  Among her other honors are: Fellow of the American Association for the Advancement of Science (1991 -- "For groundbreaking work on the application of social psychology to contemporary problems"), Professor of the Year (1997), and the 2017 Application of Personality and Social Psychology Award (for her research career on job burnout).  Links: The Truth About Burnout: https://www.amazon.com/Truth-About-Burnout-Organizations-Personal/dp/1118692136 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. One subject that I haven’t covered in much depth on this show has been a repeated request from the audience, and that is to talk a bit about burnout. So, when I asked the audience who I should talk to about burnout, there were really two categories of responses. The first was, “Pick me. I hate my job, and I’d love to talk about that.” And the other was, “You should speak to Professor Maslach.” Christina Maslach is a Professor of Psychology at Berkeley. She’s a teacher and a researcher, particularly in the area of burnout. Professor, welcome to the show.Dr. Maslach: Well, thank you for inviting me.Corey: So, I’m going to assume from the outset that the reason that people suggest that I speak to you about burnout is because you’ve devoted a significant portion of your career to studying the phenomenon, and not just because you hate your job and are ready to go do something else. Is that directionally correct?Dr. Maslach: That is directionally correct, yes. I first stumbled upon the phenomenon back in the 1970s—which is, you know, 45, almost 50 years ago now—and have been fascinated with trying to understand what is going on.Corey: So, let’s start at the very beginning because I’m not sure in, I guess, the layperson context that I use the term that I fully understand it. What is burnout?Dr. Maslach: Well, burnout as we have been studying it over many years, it’s a stress phenomenon, okay, it’s a response to stressors, but it’s not just the exhaustion of stress. That’s one component of it, but it actually has two other components that go along with it. One is this very negative, cynical, hostile attitude toward the job and the other people in it, you know, “Take this job and shove it,” kind of feeling. And usually, people don’t begin their job like that, but that’s where they go if they become more burned out.Corey: I believe you may have just inadvertently called out a decent proportion of the tech sector.Dr. Maslach: [laugh].Corey: Or at least, that might just be my internal cynicism rising to the foreground.Dr. Maslach: No, it’s not. Actually, I have heard from a number of tech people over the past decades about just this kind of issue. And so I think it’s particularly relevant. The third component that we see going along with this, it usually comes in a little bit later, but I’ve heard a lot about this from tech people as well, and that is that you begin to develop a very negative sense of your own self, and competence, and where you’re going, and what you’re able to do. So, the stress response of exhaustion, the negative cynicism towards the job, the negative evaluation of yourself, that’s the trifecta of burnout.Corey: You’ve spent a lot of your early research at least focusing on, I guess, occupations that you could almost refer to as industrial, in some respects: working with heavy equipment, working with a variety of different professionals in very stressful situations. It feels weird, on some level, to say, “Oh, yeah, my job is very stressful. In that vein, I have to sit in front of a computer all day, and sometimes I have to hop on a meeting with people.” And it feels, on some level, like that even saying, “I’m experiencing burnout,” in my role is a bit of an overreach.Dr. Maslach: Yeah, that’s an interesting point because, in fact, yes, when we think about OSHA, you know, and occupational risks and hazards, we do think about the chemicals, and the big equipment, and the hazards, so having more psychological and social risk factors, is something that probably a lot of people don’t resonate to immediately and think, well, if you’re strong, and if you’re resilient, and whatever, you can—anybody can handle that, and that’s really a test almost of your ability to do your work. But what we’re finding is that it has its own hazards, psychological and social as well. And so, burnout is something that we’ve seen in a lot of more people-oriented professions, from the beginning. Healthcare has had this for a long time. Various kinds of social services, teaching, all of these other things. So, it’s actually not a sign of weakness as some people might think.Corey: Right. And that’s part of the challenge and, honestly, one of the reasons that I’ve stayed away from having in-depth discussions about the topic of burnout on the show previously is it feels that—rightly or wrongly, and I appreciate your feedback on this one either way—it feels like it’s approaching the limits of what could be classified as mental health. And I can give terrible advice on how computers work—in fact, I do on a regular basis; it’s kind of my thing—and that’s usually not going to have any lasting impact on people who don’t see through the humor part of that. But when we start talking about mental health, I’m cautious because it feels like an inadvertent story or advice that works for some but not all, has the potential to do a tremendous bit of damage, and I’m very cautious about that. Is burnout a mental health issue? Is it a medical issue that is recognized? Where does it start, okay does it stop on that spectrum?Dr. Maslach: It is not a medical issue—and the World Health Organization, which just came out with a statement about this in 2019 on burnout, they’re recognizing it as an occupational risk factor—made it very clear that this is not a medical thing. It is not a medical disease, it doesn’t have a certain set of medical diagnoses, although people tend to sometimes go there. Can it have physical health outcomes? In other words, if you’re burning out and you’re not sleeping well, and you’re not eating well, and not taking care of yourself, do you begin to impair your physical health down the road? Yes.Could it also have mental health outcomes, that you begin to feel depressed, and anxious, and not knowing what to do, and afraid of the future? Yes, it could have those outcomes as well. So, it certainly is kind of like—I can put it this way, like a stepping stone in a path to potential negative health: physical health, or mental health issues. And I think that’s one of the reasons why it is so important. But unfortunately, a lot of people still view it as somebody who’s burned out isn’t tough enough, strong enough, they’re wimpy, they’re not good enough, they’re not a hundred percent.And so the stigma that is often attached to burnout, people not only indulge it, but they feel it directed towards them, and often they will try to hide the kinds of experiences they’re having because they worry that they are going to be judged negatively, thrown under the bus, you know, let go from the job, whatever, if they talk about what’s actually happening with them.Corey: What do you see, as you look around, I guess, the wide varieties of careers that are susceptible to burnout—which I have a sneaking suspicion based upon what you’ve said rounds to all of them—what do you think is the most misunderstood, or misunderstood aspects of burnout?Dr. Maslach: I think what’s most misunderstood is that people assume that it is a problem of the individual person. And if somebody is burned out, then they’ve got to just take care of themselves, or take a break, or eat better, or get more sleep, all of those kinds of things which cope with stressors. What’s not as well understood or focused on is the fact that this is a response to other stressors, and these stressors are often in the workplace—this is where I’ve been studying it—but in essentially in the larger social, physical environment that people are functioning in. They’re not burning out all by themselves.There’s a reason why they are feeling the kind of exhaustion, developing that cynicism, beginning to doubt themselves, that we see with burnout. So there, if you ever want to talk about preventing burnout, you really have to be focusing on what are the various kinds of things that seem to be causing the problem, and how do we modify those? Coping with stressors is a good thing, but it doesn’t change the stressors. And so we really have to look at that, as well as what people can bring about, you know, taking care of themselves or trying to do the job better or differently.Corey: I feel like it’s impossible to have a conversation like this without acknowledging the background of the past year that many of us have spent basically isolated, working from home. And for some folks, okay, they were working from home before, but it feels different now. At least that’s the position I find myself in. Other folks are used to going into an office and now they’re either isolated—and research shows that it has been worse, statistically, for single people versus married people, but married people are also trapped at home with their spouse, which sounds half-joking but it is very real. At some point, distance is useful.And it feels like everyone is sort of a bit at their wit’s end. It feels like things are closer to being frayed, there’s a constant sense that there’s this, I guess, pervasive dread for the past year. Are you seeing that that has a potential to affect how burnout is being expressed or perceived?Dr. Maslach: I think it has, and one of the things that we clearly see is that people are using the word burnout, more and more and more and more. It’s almost becoming the word du jour, and using it to describe, things are going wrong and it’s not good. And it may be overstretching the use of burnout, but I think the reason of the popularity of the term is that it has this kind of very vivid imagery of things going up in smoke, and can’t handle it, and flames licking at your heels, and all this sort of stuff so that they can do that. I even got a comment from a colleague in France just a few days ago, where they’re talking about, “Is burnout the malady of the century?” you know, kind of thing. And it’s being used a lot; it’s sometimes maybe overused, but I think it’s also striking a chord with people as a sign that things are going badly, and I don’t know how to deal with it in some way.Corey: It also feels, on some level, for those of us who are trapped inside, it kind of almost feels like it’s a tremendous expression of privilege because who am I to have a problem with this? Oh, I have to go inside and order a lot of takeout and spend time with my family. And I look at how folks who are nowhere near as privileged have to go and be essential workers and show up in increasingly dangerous positions. And it almost feels like burnout isn’t something that I’m entitled to, if that makes sense.Dr. Maslach: [laugh]. Yeah. It’s an interesting description of that because I think there are ways in which people are looking at their experience and dealing with it, and like many things in life, I find that all of these things are a bit of a double-edged sword; there’s positive and there’s negative aspects to them. And so when I’ve talked with some people about now having to work from home rather than working in their office, they’re also bringing up, “Well, hey, I’ve noticed that the interviews I’m doing with potential clients are actually going a little better”—you know, this is from a law office—“And trying to figure out how—are we doing it differently so that people can actually relate to each other as human beings instead of the suit and tie in the big office? What’s going on in terms of how we’re doing the work that there may be actually a benefit here?”For others. It’s been, “Oh, my gosh. I don’t have to commute, but endless meetings and people are thinking I’m not doing my job, and I don’t know how to get in touch, and how do we work together effectively?” And so there’s other things that are much more difficult, in some sense. I think another thing that you have to keep in mind that it’s not just about how you’re doing your work, perhaps differently, or you’re under different circumstances, but people, so many people have lost their jobs, and are worried that they may lose their jobs.That we’re actually finding that people are going into overdrive and working harder and more hours as a way of trying to protect from being the next one who won’t have any income at all. So, there’s a lot of other dynamics that are going on as a result of the pandemic, I think, that we need to be aware of.Corey: One thing that I’d like to point out is that you are a Professor Emerita of Psychology at Berkeley, which means you presumably wound up formulating this based upon significant bodies of peer-reviewed research, as opposed to just coming up with a thesis, stating it as if it were fact, and then writing an entire series of books on it. I mean, that path, I believe, is called being a venture capitalist, but I may be mistaken on that front. How do you effectively study something like burnout? It feels like it is so subjective and situation-specific, but it has to have a normalization aspect to it.Dr. Maslach: Uh, yeah, that’s a good point. I think, in fact, the first time I ever wrote about some of the stuff that I was learning about burnout back in the mid ’70s—I think it was ’75, ’76 maybe—and it was in a magazine, it wasn’t in a journal. It wasn’t peer-reviewed because not even peer-reviewed journals would review this; they thought it was pop psychology, and eh. So, I would get, in those days, snail mail by the sackfuls from people saying, “Oh, my God. I didn’t know anybody else felt like this. Let me tell you my story.”You know, kind of thing. And so that was really, after doing a lot of interviews with people, following them on the job when possible to, sort of, see how things were going, and then writing about the basic themes that were coming out of this, it turned out that there were a lot of people who responded and said, “I know that. I’ve been there. I’m experiencing it.” Even though each of them were sort of thinking, “I’m the only one. What’s wrong with me? Everybody else seems fine.”And so part of the research in trying to get it out in whatever form you can is trying to share it because that gives you feedback from a wide variety of people, not only the peers reviewing the quality of the research, but the people who are actually trying to figure out how to deal effectively with this problem. So it’s, how do I and my colleagues actually have a bigger, broader conversation with people from which we learn a lot, and then try and say, okay, and here’s everything we’ve heard, and let’s throw it back out and share it and see what people think.Corey: You have written several books on the topic, if I’m not mistaken. And one thing that surprises me is how much what you talk about in those books seems to almost transcend time. I believe your first was published in 1982—Dr. Maslach: Right.Corey: —if I’m not mistaken—Dr. Maslach: Yes.Corey: —and it’s an awful lot of what it talks about still feels very much like it could be written today. Is this just part of the quintessential human experience? Or has nothing new changed in the last 200 years since the Industrial Revolution? How is it progressing, if at all, and what does the future look like?Dr. Maslach: Great questions and I don’t have a good answer for you. But we have sort of struggled with this because if you look at older literature, if you even go back centuries, if you even go back in parts of the Bible or something, you’re seeing phrases and descriptions sometime that says sounds a lot like burnout, although we’re not using that term. So, it’s not something that I think just somehow got invented; it wasn’t invented in the ’70s or anything like that. But trying to trace back those roots and get a better sense of what are we capturing here is fascinating, and I think we’re still working on it.People have asked, well, where did the term ‘burnout’ as opposed to other kinds of terms come from? And it’s been around for a while, again, before the ’70s or something. I mean, we have Graham Greene writing the novel A Burnt-Out Case, back in the early ’60s. My dad was an engineer, rarefied gas dynamics, so he was involved with the space program and engineers talk about burnout all the time: ball bearings burn out, rocket boosters burn out. And when they started developing Silicon Valley, all those little startups and enterprises, they advertised as burnout shops. And that was, you know, ’60s, into the ’70s, et cetera, et cetera. So, the more modern roots, I think probably have some ties to that use of the term before I and other researchers even got started with it.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: This is one of those questions that is incredibly self-serving, and I refuse to apologize for it. How can I tell whether I’m suffering from burnout, versus I’m just a jerk with an absolutely terrible attitude? And that is not as facetious a question as it probably sounds like.Dr. Maslach: [laugh]. Yeah. Well, part of the problem for me—or the challenge for me—is to understand what it is people need to know about themselves. Can I take a diagnostic test which tells me if I am burned out or if I’m something else?Sort of the more important question is, what is feeling right and what is not feeling so good—or even wrong—about my experience? And usually, you can’t figure that all out by yourself and you need to get other input from other people. And it could be a counselor or therapist, or it could be friends or colleagues who you have to be able to get to a point where we can talk about it, and hear each other, and get some feedback without putdowns, just sort of say, “Yeah, have you ever thought about the fact that when you get this kind of a task, you usually just go crazy for a while and not really settle down and figure out what you really need to do as opposed to what you think you have to do?” Part of this, are you bringing yourself in terms of the stress response, but what is it that you’re not doing—or that you’re doing not well—to figure out solutions, to get help or advice or better input from others. So, it takes time, but it really does take a lot of that kind of social feedback.So, when I said—if I can stay with it a little bit more—when I first was writing and publishing about and all these people were writing back saying, “I thought I was the only one,” that phenomenon of putting on a happy face and not letting anybody else see that you’re going through some difficult challenges, or feeling bad, or depressed, or whatever is something we call pluralistic ignorance; means we don’t have good knowledge about what is normal, or what is being shared, or how other people are because we’re all pretending to put on the happy face, to pretend and make sure that everybody thinks we’re okay and is not going to come after us. But if we all do that, then we all, together, are creating a different social reality that people perceive rather than actually what is happening behind that mask.Corey: It feels, on some level, like this is an aspect of the social media problem, where we’re comparing our actual lives and all the bloopers that we see to other people’s highlight reels because few people wind up talking very publicly about their failures.Dr. Maslach: Oh, yeah. Yeah. And often for good reason because they know they will be attacked and dumped. And there could be some serious consequences, and you just say, “I’m going to figure out what I’m going to do on my own.”But one of the things that when I work with people, and I’m asking them, “What do you think would help? What sort of things that don’t happen could happen?” And so forth, one of the things that goes to the top of the list is having somebody else; a safe relationship, a safe place where we can talk, where we can unburden, where you’re not going to spill the beans to everybody else, and you’re getting advice, or you’re getting a pat on the back, or a shoulder to cry on, and that you’re there for them for the same kind of reason. So, it’s a different form of what we think of as social network. It used to be that a network like that meant that you had other people, whether family, friends, neighbors, colleagues, whoever, that you knew, you could go to; a mentor, an advisor, a trusted ally, and that you would perform that role for them and other people, as well.And what has happened, I think, to add to the emphasis on burnout these days, is that those social connections, those trusts, between people has really been shredding, and, you know—or cut off or broken apart. And so people are feeling isolated, even if they’re surrounded by a lot of other people, don’t want to raise their hand, don’t want to say, “Can we talk over coffee? I’m really having a bad day. I need some help to figure out this problem.” And so one of those most valuable resources that human beings need—which is other people—is, if we’re working in environments where that gets pulled apart, and shredded, and it’s lacking, that’s a real risk factor for burnout.Corey: What are the things that contribute to burnout? It doesn’t feel, based upon what you’ve said so far, that it’s one particular thing. There has to be points of commonality between all of this, I have to imagine.Dr. Maslach: Yeah.Corey: Is it possible to predict that, oh, this is a scenario in which either I or people who are in this role are likely to become burned out faster?Dr. Maslach: Mm-hm. Yeah. Good question and I don’t know if we have a final answer, but at this point, in terms of all the research that’s been done, not just on burnout, but on much larger issues of health, and wellbeing, and stress, and coping, and all the rest of it, there are clearly six areas in which the fit between people and their job environment are critical. And if the fit is—or the match, or the balance—is better, they are going to be at less risk for burnout, they’re more likely to be engaged with work.But if some real bad fits, or mismatches, occur in one or more of these areas, that raises the risk factor for burnout. So, if I can just mention those six quickly. And these are not in any particular order because I find that people assume the first one is the worst or the best, and it’s not. Any rate, one of them has to do with that social environment I was just talking about; think of it as the workplace community. All the people whose paths you cross at various points—you know, coworkers, the people you supervise, your bosses, et cetera—so those social relationships, that culture, do you have a supportive environment which really helps people thrive? Can you trust people, there’s respect, and all that kind of thing going on? Or is it really what people are now describing as a socially toxic work environment?A second area has to do with reward. And it turns out not so much salary and benefits, it’s more about social recognition and the intrinsic reward you get from doing a good job. So, if you work hard, do some special things, and nothing positive happens—nobody even pats you on the back, nobody says, “Gee, why don’t you try this new project? I think you’re really good at it,” anything that acknowledges what you’ve done—it’s a very difficult environment to work in. People who are more at risk of burnout, when I asked them, “What is a good day for you? A good day. A really good day.” And the answer is often, “Nothing bad happens.” But it’s not the presence of good stuff happening, like people glad that you did such good work or something like that.Third area has to do with values—and this is one that also often gets ignored, but sometimes this is the critical bottom line—that you’re doing work that you think is meaningful, where you’re working has integrity, and you’re in line with that as opposed to value conflicts and where you’re doing things that you think are wrong: “I want to help people, I want to help cure patients, and here, I’m actually only supposed to be trying to help the hospital get more money.” When they have that kind of value conflict, this is often where they have to say, “I don’t want to sell my soul and I’m leaving.”The fourth area is one of fairness. And this is really about that whatever the policies, the principles, et cetera, they’re administered fairly. So, when things are going badly here—the mismatch—this is where discrimination lives, this is where glass ceilings are going on, that people are not being treated fairly in terms of the work they do, how they’re promoted, or all of those kinds of things. So, that interpersonal respect, and, sort of, social justice is missing.The next two areas—the fifth and six—are probably the two that had been the most well-known for a long time. One has to do with workload and how manageable it is. Given the demands that you have, do you have sufficient resources, like time, and tools, and whatever other kind of teams support you need to get the job done. And control is about the amount of autonomy and the opportunities you have to perhaps improvise, or innovate, or correct, or figure out how to do the job better in some way. So, when people are having mismatches in work overload; a lack of control; you cannot improvise; where you have unfairness; where there is values that are just incompatible with what you believe is right, a sort of moral issue; where you’re not getting any kind of positive feedback, even when it’s deserved, for the kind of work you’re doing; and when you’re working in a socially toxic relationship where you can’t trust people, you don’t know who to turn to, people are having unresolved conflicts all the time. Those six areas are, those are the markers really of risk factors for burnout.Corey: I know that I’m looking back through my own career history listening to you recount those and thinking, “Oh, maybe I wasn’t just a terrible employee in every one of those situations.”Dr. Maslach: Exactly.Corey: I’m sure a lot of it did come from me, I want to be very clear here. But there’s also that aspect of this that might not just be a ‘me’ problem.Dr. Maslach: Yeah. That’s a good way of putting it. It’s really in some sense, it’s more of a ‘we’ problem than a ‘me’ problem. Because again, you’re not working in isolation, and the reciprocal relationship you have with other people, and other policies, and other things that are happening in whatever workplace that is, is creating a kind of larger environment in which you and many others are functioning.And we’ve seen instances where people begin to make changes in that environment—how do we do this differently? How can we do this better, let’s try it out for a while and see if this can work—and using those six areas, the value is not just, “Oh, it’s really in bad shape. We have huge unfairness issues.” But then it says, “It would be better if we could figure out a way to get rid of that fairness problem, or to make a modification so that we have a more fair process on that.” So, they’re like guideposts as well.As people start thinking through these six areas, you can sort of say, “What’s working well, in terms of workload, what’s working badly? Where do we run into problems on control? How do we improve the social relationships between colleagues who have to work together on a team?” They’re not just markers of what’s gone wrong, but they can—if you flip it around and look at it, let’s look at the other end—okay is a path that we could get better? Make it right?Corey: If people want to learn more about burnout in general, and you’re working in it specifically, where can they go to find your work and learn more about what you have to say?Dr. Maslach: Obviously, there’s been a lot of articles, and now lots of things on the web, and in past books that I’ve written. And as you said, in many ways, they are still pretty relevant. The Truth About Burnout came out, oh gosh, ’97. So, that’s 25 years ago and it’s still work.But my colleague, Michael Leiter from Canada, and I have just written up a new manuscript for a new book in which we really are trying to focus on sharing everything we have learned about, you know, what burnout has taught us, and put that into a format of a book that will allow people to really take what we’ve learned and figure out how does this apply? How can this be customized to our situation? So, I’m hoping that that will be coming out within the next year.Corey: And you are, of course, welcome back to discuss your book when it releases.Dr. Maslach: I would be honored if you would have me back. That would be a wonderful treat.Corey: Absolutely. But in return, I do expect a pre-release copy of the manuscript, so I have something intelligent to talk about.Dr. Maslach: [laugh]. Of course, of course.Corey: Thank you so much for your time. I really appreciate it.Dr. Maslach: Well, thank you for having me. I appreciate the opportunity to share this, especially during these times.Corey: Indeed. Professor Christina Maslach, Professor Emeritus of Psychology at Berkeley, I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment telling me why you’re burned out on this show.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About ScottScott is a web developer who has been blogging at https://hanselman.com for over a decade. He works in Open Source on ASP.NET and the Azure Cloud for Microsoft out of his home office in Portland, Oregon. Scott has three podcasts, http://hanselminutes.com for tech talk, http://thisdeveloperslife.com on developers' lives and loves, and http://ratchetandthegeek.com for pop culture and tech media. He's written a number of books and spoken in person to almost a half million developers worldwide.Links: Hanselminutes Podcast: https://www.hanselminutes.com/ Personal website: https://hanselman.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Scott Hanselman of Microsoft. He calls himself a partner program manager—or is called a partner program manager. But that feels like it’s barely scraping the surface of who and what he is. Scott, thank you for joining me.Scott: [laugh]. Thank you for the introduction. I think my boss calls me that. It’s just one of those HR titles; it doesn’t really mean—you know, ‘program manager,’ what does it even mean?Corey: I figure it means you do an awful lot of programming. One of the hardest questions is, you start doing different things—and Lord knows you do a lot of them—is that awful question that you wind up getting at cocktail parties of, “So, what is it you do exactly?” How do you answer that?Scott: Yeah, it’s almost like, if you spent any time on Clubhouse recently, there was a wonderful comedian named Spunky Brewster on Instagram who had a whole thing where she talked about the introductions at the beginning of a Clubhouse thing, where it’s like, you’re a multi-hyphenate sandwich artist slash skydiver slash programmers slash whatever. One doesn’t want to get too full of one’s selves. I would say that I have for the last 30 years been a teacher and a professional enthusiast around computing and getting people excited about computing. And everything that I do, whether it be writing software, shipping software, or building community, hangs off of the fact that I’m an enthusiastic teacher.Corey: You really are. And you’re also very hard to pin down. I mean, it’s pretty clear to basically the worst half of the internet, that you’re clearly a shill. The problem is defining exactly what you’re a shill for. You’re obviously paid by Microsoft, so clearly you push them well beyond the point when it would make sense to.You have a podcast that has been on for over 800 episodes—which puts this one to shame—called Hanselminutes, and that is, of course, something where you’re shilling for your own podcast. You’ve recently started on TikTok, which I can only assume is what the kids are into these days. You’re involved in so many different things and taking so many different positions, that it’s very hard to pin down what is the stuff you’re passionate about.Scott: I’m going to gently push back and say—Corey: Please do.Scott: That if one were to care to look at it holistically, I am selling enthusiasm around free and open-source software on primarily the Windows platform that I’m excited about, and I am selling empowerment for the next generation of people who want to do computing. Before I went to Microsoft, my blog and my podcast existed, and I was consistent in my, “Hey, have you heard the news?” Message to anyone who would listen. And I taught at both Portland Community College and Oregon Institute of Technology, teaching web services and history of the web and C# and all that kind of stuff. So, I’m one of those people where if you touch on a topic that I’m interested in, I’ll be like, “Oh, my goodness, let’s”—and I’ll just like, you know, knock everything off the desk and I’m going to be like, “Okay, let’s build a model, a working model of the solar system here, now. The orange is the sun.”And it’s like, suddenly now we’re talking about science, like Hank Green or whatever. My family will ask me, “Why isn’t the remote control working?” And then I’ve taken it apart and I’m explaining to them how the infrared LED inside works. And, you know, how can you not be excited about all these things? And that’s my whole thing about computing and the power that being able to program computers represents to me.Corey: I would agree with that. I’d say that one thing that is universal about everything you’re involved in is the expression I heard that I love and am going to recapture has been, “Sending the elevator back down.”Scott: Oh, yeah. Throwing ladders, ropes, elevators. I am very blessed to have made it out of my neighborhood, and I am very hopeful that anyone who is in a situation that they do not want to be in could potentially use coding, programming, IT, computing as the great equalizer and that I can I could somehow lend my privilege to them to get the things done and solve the problems that they want to solve with computers.Corey: I’m sure that you’ve been asked ad nauseum about—you work in free and open-source software. You’ve been an advocate for this, effectively, for your entire career; did no one tell you you work at Microsoft? But that’s old Microsoft in many respects. That’s something that we’ve covered with a bunch of different guests previously from Microsoft, and it’s honestly a little—it’s becoming a bit of a tired trope. It was a really interesting conversation a few years back that, oh, it’s clearly all just for show.Well, that is less and less obvious, and more tired and frankly bad take as time progresses. So, I want to go back a bit further into my own personal journey because it turns out that the number one reason to reach out to you for anything is tech support on various things. I don’t talk about this often, but I started my career moonlighting as a Windows admin, back in the Windows 2003 server days; and it was an experience, and licensing was a colossal pain, and I finally had enough of it one day, in 2006, switched over to Unix administration on BSD, and got a Mac laptop, and that was really the last time that I used Windows in anger. Now, it’s been 15 years since that happened, and I haven’t really been tracking the Windows ecosystem. What have I missed?Scott: [laugh]. There’s a lot there that you just said. So first, different people have their religions and they’re excited about them, and I encourage everyone to be excited about the religion that they’re excited about. It’s great to be excited about your thing, but it’s also really not cool to be a zealot about your thing. So hey, be excited about Windows, be excited about Linux, be excited about Mac.Just don’t tell me that I’m going to heck because I didn’t share your enthusiasm. Let’s just be excited together and we can be friends together. I’ve worked on Linux at Nike, I’ve worked on Mac, I’ve worked on Windows, you know, I’ve been there before these things existed and I’ll be there afterwards.Corey: Exactly. At some point being a zealot for a technology just sort of means you haven’t been around the block enough to understand how it’s going to break, how it’s going to fail, how it’s going to evolve, and it doesn’t lead to a positive outcome for anyone. It fundamentally becomes a form of gatekeeping more than anything else, and I just don’t have the stomach for it.Scott: Yeah. And ultimately, we’re just looking for—you know, we got these smart rocks that we taught how to think with lightning, and they’re running for loops for us. And maybe they’re running them in the cloud, maybe they’re running locally. So, I’m not really too worried about it. Windows is my thing of choice, but just, you know, one person’s Honda is another person’s Toyota; you get excited about the brand that you start out with.So, that’s that. Currently, though, Windows has gone, at least in the last maybe 20 years, from one of those things where there’s generational pain, and, like, “Microsoft killed my Pappy, and I’ll never forgive you.” And it’s like, yeah, there was some dumb stuff in the ’90s with Internet Explorer, but as a somewhat highly placed middle manager at Microsoft, I’ve never been in an active mustache-twirling situation where I was behind closed doors and anyone thought anything nefarious. There’s only a true, “What’s the right thing for the customer? What is the right thing for the people?”My whole thing is to make it so developers can develop more easily on Windows, so I’m very fortunate to be helping some folks in a partnership between the Windows division and the developer division that I work in to make Windows kick butt when it comes to dev. Historically, the Windows terminal, or what’s called cmd.exe which is run by a thing called the console host has sucked; it has lagged behind. So, if you drop out to the command line, you’ve got the, you know, the old, kind of, quote-unquote, “DOS shell” with a cmd processor—it’s not really DOS—running in an old console host. And it’s been there for gosh, probably early ’90s. That sucks.But then you got PowerShell. And again, I want to juxtapose the difference between a console—or a terminal—and a shell. They’re different things. There’s lots of great third-party terminals in the ecosystem. There’s lots of shells to choose from, whether it be PowerShell, PowerShell Core—now PowerShell 7.0—or the cmd, as well as bash, and Cygwin, and zsh, and fish.But the actual thing that paints the text on Windows has historically not been awesome. So, the new open-source Windows terminal has been the big thing. If you’re a Machead and you use iTerm2, or Hyper, or things like that, you’ll find it very comfortable. It’s a tabbed terminal, split-screen, ripping fast, written in, you know, DirectX, C++ et cetera, et cetera, all open-source, and then it lets you do transparency, and background colors, and ligature fonts, and all the things that a great modern terminal would want to do. That is kind of the linchpin of making Windows awesome for developers, then gets even awesomer when you add in the ability that we’re now shipping an actual Linux kernel, and I can run N number of Linuxes side-by-side, in multiple panes, all within the terminal.This getting to the point about juxtaposing the difference between a terminal and a console and a shell. So, I’ve got, on the machine, I’m talking to you on right now, on my third monitor, I’ve got Windows terminal open with PowerShell on Windows on the left, Ubuntu 18.04 LTS on the right, with the fish shell. And then I’ve got another Ubuntu 20.04 with bash, a standard bash shell.And I’m going and testing stuff in Docker, and running .NET in Docker, and getting ready to deploy my own podcast website up into Azure. And I’m doing it in a totally organic way. It’s not like, “Oh, I’m just running a virtual machine.” No, it’s integrated. That’s what I think you’d be impressed with.Corey: That right there is the reason that I generally tended to shy away from getting back into the Windows ecosystem for the longest time—and this is not a slam on Windows, by any stretch of the—Scott: No of course. Sure, sure, sure.Corey: —imagination—my belief has always been that you operate within the environment as it’s intended to be operated within, and it felt at the time, “Oh, install Cygwin, and get all this other stuff going, and run a VM to do it.” It felt like I was fighting upstream in some respects.Scott: Oh, yeah, that’s a great point. Let’s talk about that for a second. So—Corey: Let’s do it.Scott: So, Cygwin is the GNU utilities that are written in a very nice portable C, but they are written against the Windows kernel. So, the example I like to use is ls, you type ls, you list out your directory, right? So, ls and dir are the same thing for this conversation. Which means that someone has to then call a system call—syscall in Linux, Windows kernel call in Windows—and say, “Hey, would you please enumerate these files, and then give me information about them, and check the metadata?” And that has to call the file system and then it’s turtles all the way down.Cygwin isn’t Linux. It’s the bash and GNU utilities recompiled and compiled against the Windows stuff. So, it’s basically putting a bash skin on Windows, but it’s not Linux; it’s bash. Okay? But WSL is actually Linux, and rather than firing up a big 30 gig Hyper-V, or VirtualBox, or Parallels virtual machine, which is, like, a moment—“I’m firing up the VM; call me in an hour when it comes back up.”—and when the VM comes up, it’s, like, a square on your screen and now you’re dealing with another thing to manage.The WSL stuff is actually a utility virtual machine built on a lower subsystem, the virtualization platform, and it starts in less than a second. You can start it faster than you can say, one one-thousand. And it goes instantly up, it automatically allocates and deallocates memory so that it’s smart about memory, and it’s running the actual Linux kernel, so it’s not pretending to be Linux. So, if your goal is a Linux environment and you’re a Linux developer, the time of Linux on the desktop is happening, in this case, on the Windows desktop. Where you get interesting stuff, and where I think your brain might explode is, imagine you’re in the terminal, you’re at the Linux file system at the bash prompt, and you type ‘notepad.exe.’ What would you expect to happen? You’d expect it to try to find it in a Linux path and fail.Corey: Right. And then you’re trying to figure out, am I in this environm—because you generally tend to run these things in the same-looking terminal, but then all the syntax changes as soon as you go back into the Windows native environment, you’re having to deal with line-ending issues on a constant basis, and you just—Scott: Oh, yeah. All that stuff, where.Corey: And as soon as you ask for help because back in those days, I was looking primarily into using freenode as my primary source of support because I network staff on the network for the better part of a decade, and the answer is, “I’m having some trouble with Linux,” and the response is, “Oh, you’re doing this within a Windows environment? Get a real computer, kid.” Because it’s still IRC, and being condescending and rude to anyone who makes different choices than you do is apparently the way that was done back then.Scott: Well, today in 2020 because we don’t want to just have light integration with Windows—and by light integration, like, I don’t know if you remember firing up a virtual machine on Windows and then, like, copy-pasting a file, and we were all going like, “Oh, my God, that’s amazing.” I drug the file in and then it did a little bit of magic and then moved the file from Windows into Linux. What we want is to blur the lines between the two so you can move comfortably. When you type explorer.exe or notepad.txt in Linux on Windows, Linux says no, and then Windows gets the chance, fires it up, and can access the Linux file system.And since Notepad now understands line endings, just happily, you can open up your .profile, your bash_profile, your csh file in Notepad, or—here’s where it gets interesting—Visual Studio Code, and comfortably run your Windows apps, talking to your Linux file system, or in the—coming soon, and we’ve blogged about this and announced it at Build last year, run Linux GUI apps seamlessly so that I could have two browsers up, two Chromes, one Windows and one Linux, side-by-side, which is going to make web testing even that much easier. And I’m moving seamlessly between the two. Even cooler, I can type explorer.exe and then pass in dot, which represents the current folder, and if the current folder is the Linux file system, we seamlessly have a Plan 9 server—basically a file server that lets you access your Linux file system—from—Corey: Is it actually running Plan 9?Scott: It is a Plan 9 server.Corey: That is amazing. I’m sorry, that is a blast from the past.Scott: I’m glad. And we can run N number of Linuxes; this isn’t just one Linux. I’ve got Kali Linux, two different Ubuntus, and I could tar up the user mode files on mine, zip them up, give them to you, and you could go and type ‘wsl–import,’ and then have my Linux file system. Which means that we could make a custom Screaming in the Cloud distro, put it in the Windows Store, put it up on GitHub, build our own, and then the company could standardize on our Linux distro and run it on Windows.Corey: That is almost as terrible an idea as using a DNS service as a database.Scott: [laugh].Corey: I love it. I’m totally there for it.Scott: It’s really nice because it’s extremely—the point is, it has to have no friction, right? So, if you think about it this way, I just moved—I blogged about this; if people want to go and learn about it—I just moved my blog of 20 years off of a Windows Server 2008 server running under someone’s desk at a host, into Azure. This is a multi-month-long migration. My blog, my main site, kind of the whole Hanselman ecosystem moved up in Azure. So, I had a couple things to deal with.Am I going to go from Windows to Linux? Am I going to go from a physical machine to a virtual machine? Am I going to go from a physical machine to a virtual machine to a Platform as a Service? And when I do that, well, how is that going to change the way that I write software? I was opening it in Visual Studio, pressing F5, and running it in IIS—the Internet Information Server for Windows—for the last 15, 20 years.How do I change that experience? Well, I like Visual Studio; I like pressing F5; I like interactive debugging sessions. But I also like saving money running Linux in the cloud, so how can I have the best of all those worlds? Because I wrote the thing in .NET, I moved into .NET 5, which runs everywhere, put together a Docker file, got full support for that in Visual Studio, moved it over into WSL so I can test it on both Windows and Linux.I can go into my folder on my WSL, my Windows subsystem for Linux, type code dot, open up Visual Studio Code. Visual Studio Code splits in half. The Windows client of Visual Studio Code runs on Windows; the server, the Visual Studio Code server, runs in WSL providing the bridge between the two worlds, and I can press F5 and have interactive debugging and now I’m a Linux developer even though I’ve never left Windows. Then I can right-click publish in Visual Studio to GitHub Actions, which will then throw it into the cloud, and I moved everything over into Azure, saved 30%, and everything’s awesome. I’m still a Windows developer using Visual Studio. So, it’s pretty much I don’t know, non-denominational; kind of mixing the streams here.Corey: It is. And let me take it a step further. When I’m on the road, the only computer I bring with me these days—well, in the before times, let’s be very realistic. Now, when ‘I’m on the road,’ that means going to the kitchen for a snack—the only computer I bring with me is my iPad Pro, which means that everything I do has a distinct application. For when I want to get into my development environment, historically it was, use some terminal app—I’m a fan of Blink, but everyone has their own; don’t email me.And everything else I tended to use looked an awful lot like a web app. If there wasn’t a dedicated iOS app, it was certainly available via a web browser. Which leads me to the suspicion that we’re almost approaching a post-operating-system world where the future development operating system begins to look an awful lot—and people are going to yell at me for this—Visual Studio Code.Scott: Mmm.Corey: It supports a bunch of remote activities now that GitHub Codespaces is available—at least to my account; I don’t know if it’s generally available yet—but I’ve been using it; I love it; everything it winds up doing is hosted remotely in Azure; I don’t have to think about managing the infrastructure; it’s just another tab within GitHub, and it works. My big problem is that I’m trying to shake, effectively, 20 years of muscle memory of wrestling with Vim, and it takes a little bit of a leap in order to become comfortable with something that’s a more visually-oriented IDE.Scott: Why don’t you use the VsVim, Jared Parsons Vim plugin for Visual Studio?Corey: I’ve never yet found a plugin that I like for something else to make it behave like Vim. Vimperator is a browser extension, all of it just tends to be unfortunate and annoying in different ways. For whatever reason, the way that I’m configured or built, it doesn’t work for me in the same way. And it goes back to our previous conversation about using the native offering as it comes, rather than trying to make it look like something else.Scott: Okay. I would just offer to you and for other Vim people who might be listening, that VS Code Vim does have 2.5 million installs, over 2 million people happily using that. And they are—Corey: Come to find it only has 200,000 actual users; there was an installation bug and one person just kept trying over and over and over. I kid, I kid.Scott: No, seriously though, these are actual Vim-heads and Jared Parsons is a developer at Microsoft who is like, out of his cold dead hands you’ll pull his Vim. So, there’s solutions; whether you’re Vim or Emacs, you know, we welcome all comers. But to your point, the Visual Studio, once it got split in half, where the language services, those services that provide context to Python, Ruby, C# C++ et cetera, once those extensions can be remoted, they can run on Windows, they can run on Linux, they can run on the cloud. So, VS Code being split in half as a client-server application has really made it shine. And for me, that means that I don’t notice a difference, whether I’m running VS Code on Windows or running VS Code to a remote Linux install, or even using SSH and coding on Windows remotely to a Raspberry Pi.Corey: I love the idea. I’ve seen people do this, in some respects, back in the days of Code Server being a project on GitHub, and it took a fair bit of wrangling to get that to work in a way that wasn’t scarily insecure and reliable. But once it was up and running, you could effectively plug a Raspberry Pi in underneath your iPad and effectively have a portable computer on the go that did local development. I’m looking at this and realizing the future doesn’t look at all like what I thought it was going to, and it’s really still kind of neat.Scott: Mm-hm.Corey: There’s a lot of value in being able to make things like this more accessible, and the reason I’m excited about a lot of this, too, is that aligned with a generous free tier opportunity, which I don’t know final pricing for things like GitHub Codespaces, suddenly the only real requirement is something that can render a browser and connect to the internet for an awful lot of folks to get started. It doesn’t require a fancy local overpowered development machine the way a lot of things used to. And yes, I know; there are certain kinds of development that are changing in that respect, but it still feels to me like it has never been easier to get started with all of this technology than ever before, with a counterargument that there’s so many different directions to go in. “Oh, I want to get started using Visual Studio Code or learning to write JavaScript. Great. How do I do this? Let me find a tutorial.” And you find 20 million tutorials, and then you’re frozen with indecision. How do you get past that?Scott: Yeah, there is and always will be, unfortunately, a certain amount of analysis paralysis that occurs. I started a TikTok recently to try to help people to get involved in coding, and the number one question I get—and I mean, thousands and thousands of them—are like, “Where do I start?” Because everyone seems to think that if they pick the wrong language, that will be a huge mistake. And I can’t think of a wrong language, you know? Like, what human language should I learn?You know, English, Chinese, Arabic, Japanese. Pick one and then learn another one if you can. Learn a couple. But I don’t think there’s a wrong language to learn because the basics of computer science are the basics of computer science. I think what we need to do is remind people that computers are computers no matter whether they’re an Android phone or a Windows laptop, and that any forward motion at all is a good thing. I think a lot of people have analysis paralysis, and they’re just afraid to pick stuff.Corey: I agree with what you’re saying, but I’m also going to push back gently on what you’re saying, as well. If someone who is new to the field was asking me what language to learn, I would be hard-pressed to recommend a language that was not JavaScript. I want to be clear, I do not understand or know JavaScript at all, but it’s clear from what I’m seeing, that is, in many ways, the language of the future. It is how frontend is being interacted with; there are projects from every cloud provider that wind up managing infrastructure via JavaScript primitives. There are so many on-ramps for this, and the user experience for new folks is phenomenal compared to any language that I’ve worked with in my career. Would you agree with that or disagree with that assessment?Scott: So, I’ve written blog posts on this topic, and my answer is a little more ‘it depends.’ I say that people should always learn JavaScript and one other language, preferably a systems language, which also may be JavaScript. But rather than thinking about things language-first, we think about things solutions-first. If someone says, “I want to do a lot of data science,” you don’t learn JavaScript. If someone says, “I want to go and write an Android app,” yeah, you could do that in JavaScript, but JavaScript is not the answer to all questions.Just as the English language, while it may be the lingua franca, no pun intended, it is not the only language one should pick. I usually say, “Well, what do you want to do?” “Well, I want to write a video game for the Xbox.” Okay, well, you’re probably not going to do that in JavaScript. “Oh, I want to do data science. I want to write an iPhone app.” JavaScript is the language you should learn if you’re going to be doing things on the web, yes, but if you’re going to be writing the backend for WhatsApp, then you’re not going to do that JavaScript.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: Yeah, I think you’re right. It comes down to what is the problem you’re trying to solve for? Taking the analogy back to human languages, well, what is your goal? Is it just to say that you’ve learned a language and to understand, get a glimpse at another culture through its language? Yeah, there is no wrong answer. If it’s that you want to go live in France one day and participate in French business discussions, I have a recommendation for you, and it’s probably not Sanskrit.At some point, you have to align with what people want to do and the direction they’re going in with the language selection. What I like about JavaScript is, frankly, it’s incredible versatility as far as problems to which it can be applied. And without it, I think you’re going to struggle as you enter the space. My first language was crappy Perl—slash bash because everyone does bash when you’re a systems administrator—and then it has later evolved now to crappy Python as my language of choice. But I’m not going to be able to effectively do any frontend work in Python, nor would I attempt to do so.My way of handling frontend work now is to have the good sense to pay a professional. But if you’re getting started today and you’re not sure what you want to do in your career, my opinion has always been that if you think you know what you want to do in your career, there’s a great chance you’re going to be wrong, but pursuing the thing that you think you want to do will open other opportunities and doors, and present things to you that will catch your interest in a way you might not be able to anticipate. So, especially early on in careers, I like biasing for things that give increased options, that boost my optionality as far as what I’m going to be able to do.Scott: Okay. I think that’s fair. I think that no one ever got fired for picking IBM; [laugh] no one ever jeopardized their career by choosing JavaScript. I do think it’s a little more nuanced, as I mentioned.Corey: It absolutely is. I am absolutely willing to have a disagreement with you on that front. I think the thing that we’re aligned on is that whatever you pick, make sure it’s something you’re interested in. Don’t do it just for—like, “Well, I’m told I can make a lot of money doing X.” That feels like it’s the worst reason to do things, in isolation.Scott: That’s a tough one. I used to think that, too, but I am thinking that it’s important to note and recognize that it is a valid reason to get into tech, not for the passion because for no other reason that I want to make a lot of money.Corey: Absolutely. I could not agree with you more, and that is… something I’ve gotten wrong in the past.Scott: Yeah. And I have been a fan of saying, you know, “Be passionate and work on these things on the side,” and all that kind of stuff. But all of those things involve a lot of assumptions and a lot of privileges that, you know, people have: that you have spare time and that you have a place to work on these things. I work on stuff on the side because it feeds my spirit. If you work on woodworking, or drones, or gardening on the side, you know, not everything you work on the side has to be steeped in hustle culture and having a startup, or something that you’re doing on the side.Corey: Absolutely. If you’re looking at a position of wanting to get into technology because it leads to a better financial outcome for you and that is what motivates you, you’re not wrong.Scott: Exactly.Corey: The idea that, “Oh, you have to love it or you’ll never succeed.” I think that some of the worst advice we ever wind up giving folks early in their career—particularly young people—is, ‘follow your passion.’ That can be incredibly destructive advice in some contexts, depending upon what it is you want to do and what you want your life to look like.Scott: Yeah, exactly.Corey: One of the things that I’ve always been appreciative of from afar with Microsoft has been there’s an entire developer ecosystem, and historically, it’s focused on languages I can barely understand: ASP.NET, the C# is deep in that space, F#, I think, is now a thing as well. There’s an entire ecosystem around this with Visual Studio the original, not Visual Studio Code—turns out naming is one of those things that no tech companies seems to get right—but it feels almost like there’s an entire ecosystem there for those of us who spent significant time—and I’m speaking for myself here, not you—in the open-source community talking about things like Perl and whatnot, I never got much exposure to stuff like that. I would also classify Enterprise Java as being in that direction as well. Is there a bifurcation there that I’m not seeing, or was I just never talking to the right people? All the above? Maybe I was just—maybe I had blinders on; didn’t realize it.Scott: There was a time when the Microsoft developer ecosystem meant write things for Windows, do things on Windows, use languages that Microsoft made and created. And now, with the rise of the cloud and with the rise of Software as a Service, Microsoft is a much simpler company, which is a funny thing to say for such a complicated company. Microsoft would love to run your for loop in the cloud for money. We don’t care what language you use; we want you to use the language that makes you happy. Somewhere around five to seven years ago, in the developer division, we started optimizing for developer happiness.And that’s why you can write Ruby, and Perl, and Python, and C, and C++ and C# and all those different things. Even C# now, and .NET, is owned by the .NET Foundation and not by Microsoft. Microsoft, of course, is one of the primary users, but we’ve got a lot of—Samsung is a huge contributor, Google is a huge contributor, Amazon Web Services is a big contributor to .NET.So, Microsoft’s own zealotry towards—and bias towards our own languages has, kind of, gone away because Office is on iPhone, right? Like, anywhere that you are, we’ll go there. So, we’re really going where the customer is rather than trying to funnel the customer into where we want them to be, which is a really an inverted way of doing things over the way it was done 20, 30 years ago. In my opinion.Corey: This gets back to the idea of the Microsoft cultural transformation. It hasn’t just been an internal transform; it’s been something that is involved with how it’s engaging with its customers, how it’s engaging with the community, how it’s becoming available in different ways to different folks. It’s hard to tell where a lot of these things start and where a lot of these things stop. I don’t pretend to be a Microsoft “fanboy,” quote-unquote, but I believe it is impossible to look at what has happened, especially in the world of cloud, and not at the very least respect what Microsoft has been able to achieve.Scott: Well, I came here to open source stuff. I’m surely not responsible for the transformation, I’m just a cog in the machine, but I can speak for the things that I own, like .NET and Visual Studio Community, and I think one of the things that we have gotten right is we are trying to create zero-distance products. You could be using Visual Studio Code, find a bug, suggest a feature, have a conversation in public with the PMs and devs that own the thing, get an insider’s build a few days later, and see that promoted to production within a week or two. There is zero distance between you the consumer and the creator of the thing.And if you wanted to even fix the bug yourself, submit a pull request, and see that go into production, you could do that as well. You know, some of our best C# compiler folks are not working for Microsoft and they are giving improvements, they are making the product better. So, zero-distance in many ways, if you look at the other products at Microsoft, like PowerToys is a great thing, which is [unintelligible 00:32:06] an incubator for Windows features. We’re adding stuff to the PowerToys open-source project like launchers, and a thing called FancyZones that is a window tiling manager, you know, features that prosumers and enthusiasts always wished Windows could have, they can now participate in, thereby creating a zero-distance product in Windows itself.Corey: And I want to point out as well that you are still Microsoft. You, the collective you. I suppose you personally; that is where your email address ends. But you’re still Microsoft. This is still languages, and tools, and SDKs, and frameworks used by the largest companies in the world. This zero-distance approach is being done on things that service banks, who are famously not the earliest adopters of some code that I wrote last night; it’s probably fine.Scott: Do you know what my job was before I came here?Corey: Tell me.Scott: I was the chief architect at a finance company that created software for banks. I was responsible for a quarter of the retail online banking systems in North America, built on .NET and open-source software. [laugh].Corey: So, you’ve lived that world. You’ve been that customer.Scott: Trying to convince a bank that open-source was a good idea in the early 2000s was non-trivial. You know, sitting around in 2003, 2004, talking about Agile, and you know, continuous integration, and build servers, and then going and saying, “Hey, you should use the software,” trying to deal with lawyers and explain to them the difference between the MIT, Apache, and GPL licenses and what it means to their bank was definitely a challenge. And working through those issues, it has been challenging. But open-source software now pervades. Just go and look at the license.txt in the Visual Studio Program Files folder to see all of the open-source software that is consumed by Visual Studio.Corey: One last topic that I want to get to before we call it a show is that you’ve spent a significant portion of your career, at least recently, focusing on, more or less, where the next generation of engineers, developers, et cetera, come from. And to that end, you’ve also started recently with TikTok, the social media platform. Are those two things related, first off, or am I making a giant pile of unwarranted assumption?Scott: [laugh]. I think that is a fair assumption. So, what’s going on is I want to make sure that as I fade away and I leave the software industry in the next, you know, N number of years, that I’m setting up as many people as possible for success. That’s where my career started when I was a professor, and that’s hopefully where my career will end when I am a professor again. Hopefully, my retirement gig will have me teaching at some university somewhere.And in doing that, I want to find the next million developers, right? Where are they, the next 10 million developers? They’re probably not on Twitter. They might be a lot of different places: they might be on Discord, they might be on Reddit, they might be on forums that I haven’t found yet. But I have found, on TikTok, a very creative and for the most part kind and inclusive community.And both myself and also recently, the Visual Studio Code team have been hanging out there, and sharing our creativity, and having really interesting conversations about how you the listener can if not be a programmer, be a person that knows better the tools that are available to you to solve problems.Corey: So, I absolutely appreciate and enjoy the direction that you’re going in, but again, people invite you to things and then spring technical support questions on you. Can you explain what TikTok is? I’m still trying to wrap my head around it because I turned around and discovered I was middle-aged one day.Scott: Sure. Well, I mean, I am an old man on TikTok, to be clear. TikTok, like Twitter, revels in its constraints. If you recall, there was a big controversy when Twitter went from 140 characters to 280 because people thought it was just letting the constraint that we were so excited about—which was artificial because it was the length of a standard message service text—Corey: I’m one of those people who bitterly protested it. I was completely wrong.Scott: Right? But the idea that something is constrained, that TikTok is either 15 seconds, or less than 60, it’s similar to Vine in that it is a tiny video; what can I do in one minute? Additionally, before they allowed uploading of videos, everything was constrained within the TikTok editor, so people would do amazing and intricate 30 and 40 shot transitions within a 60 second period of time. But one of the things I find most unique about TikTok is you can reply to a text comment with a video. So, I make a video—maybe I do 60 seconds on how to be a software engineer—somebody replies in text, I can then reply to that text with a video, and then a TikTok creator can do what’s called a stitch and reply to my video with a video.So, I could take 15 seconds of yours, a comment that you made, and say, “Oh, this is a great comment. Here’s my thoughts on that comment.” Or we could even do a duet where you record a video and then I record one, side-by-side. And we either simulate that we’re actually having a conversation, or I react to your video as well. Once you start teaching TikTok about yourself by liking things, you curate a very positive place for yourself.You might get on TikTok, not logged in, and it’s dancing, and you might find some inappropriate things that you don’t necessarily want to see, or you’re not interested in, but one of the things that I’ve noticed as I talk about my home network and coding is people will say, “Oh, I finally found adjective TikTok; I finally found coding TikTok I finally found IT TikTok. Oh, I’m going to comment on your post because I want to stay on networking TikTok.” And then your feed isn’t just a feed of the people that you follow, but it’s a feed of all the things that TikTok thinks you’re excited about. So, I am on this wonderful TikTok of linguistics and languages, and I’m learning about cultures, and I’m on indigenous TikTok, and I’m on networking TikTok. And the mix of creativity and the constraint of just 60 seconds has been, really, a joy. And I’ve only been there for about a month and I’ve blessed to have 80,000 people hanging out with me there.Corey: It sounds like you’re quite the fan of the platform, which alone in isolation, is enough to get me to look at it in more depth.Scott: I am a fan of creativity. I would also say though, it’s very addictive once you find your people. I’ve had to put screen time limits on my own phone to keep me from burning time there.Corey: That is all of tempting, provocative, and disturbing. I—Scott: You should hang out with me on YouTube, then. I just got my 100,000 YouTube Silver Play Button in the mail. That’s where I spend my time doing my long-form. I just did, actually, 17 minutes on WSL and how to use Linux. That might be a good starter for you.Corey: It very well might. So, if people want to learn more about what you’re up to, and how you think about the wide variety of things you’re interested in, where can they find you?Scott: They should start at my last name dot com: Hanselman.com. They used to be able to Google for Scott, and I was in an epic battle with Scott brand toilet paper tissue, and then they trademarked the name Scott and now I’m somewhere in the distant second or third page. It was a tragedy. But as an early comer—Corey: Oh, my condolences.Scott: Yeah, oh my God. As an early comer to the internet, it was me and Scott Fly Rods on the first page, for many, many years. And then—Corey: If it helps, you and Scott Fly Rods are both on page two.Scott: Oh. Well, the tyranny of the Scott toilet paper conspiracy against me has been problematic.Corey: Exactly.Scott: [laugh].Corey: Thank you so much for taking the time to speak with me today. I really do appreciate it.Scott: It’s my pleasure.Corey: Scott Hanselman, partner program manager at Microsoft and so much more. I’m Cloud Economist Corey Quinn. This is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with a crappy comment that starts with a comment that gatekeeps a programming language so we know to ignore it.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About MartinMartin Mao is the co-founder and CEO of Chronosphere. He was previously at Uber, where he led the development and SRE teams that created and operated M3. Prior to that, he was a technical lead on the EC2 team at AWS and has also worked for Microsoft and Google. He and his family are based in our Seattle hub and he enjoys playing soccer and eating meat pies in his spare time.Links: Chronosphere: https://chronosphere.io/ Email: contact@chronosphere.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’ve often talked about observability, or as I tend to think of it when people aren’t listening, hipster monitoring. Today, we have a promoted episode from a company called Chronosphere, and I’m joined today by Martin Mao, their CEO and co-founder. Martin, thank you for coming on the show and suffering my slings and arrows.Martin: Thanks for having me on the show, Corey, and looking forward to our conversation today.Corey: So, before we dive into what you’re doing now, I’m always a big sucker for origin stories. Historically, you worked at Microsoft and Google, but then you really sort of entered my sphere of things that I find myself having to care about when I’m lying awake at night and the power goes out by working on the EC2 team over at AWS. Tell me a little bit about that. You’ve hit the big three cloud providers at this point. What was that like?Martin: Yeah, it was an amazing experience, I was a technical lead on one of the EC2 teams, and I think when an opportunity like that comes up on such a core foundational project for the cloud, you take it. So, it was an amazing opportunity to be a part of leading that team at a fairly early stage of AWS and also helping them create a brand new service from scratch, which was AWS Systems Manager, which was targeted at fleet-wide management of EC2 instances, so—Corey: I’m a tremendous fan of Systems Manager, but I’m still looking for the person who named Systems Manager Session Manager because, at this point, I’m about to put a bounty out on them. Wonderful service; terrible name.Martin: That was not me. So, yes. But yeah, no, it was a great experience, for sure, and I think just seeing how AWS operated from the inside was an amazing learning experience for me. And being able to create foundational pieces for the cloud was also an amazing experience. So, only good things to say about my time at AWS.Corey: And then after that, you left and you went to Uber where you led development and SRE teams that created and operated something called M3. Alternately, I’m misreading your bio, and you bought an M3 from BMW and went to drive for Uber. Which is it?Martin: I wish it was the second one, but unfortunately, it is the first one. So yes, I did leave AWS and joined Uber in 2015 to lead a core part of their monitoring and eventually larger observability team. And that team did go on to build open-source projects such as M3—which perhaps we should have thought about the name and the conflict with the car when we named it at the time—and other projects such as Jaeger for distributed tracing as well, and a logging backend system, too. So, yeah, definitely spent many years there building out their observability stack.Corey: We’re going to tie a theme together here. You were at Microsoft, you were at Google, you were at AWS, you were at Uber, and you look at all of this and decide, “All right. My entire career has been spent in large companies doing massive globally scaled things. I’m going to go build a small startup.” What made you decide that, all right, this is something I’m going to pursue?Martin: So, definitely never part of the plan. As you mentioned, a lot of big tech companies, and I think I always got a lot of joy building large distributed systems, handling lots of load, and solving problems at a really grand scale. And I think the reason for doing a startup was really the situation that we were in. So, at Uber as I mentioned, myself and my co-founder led the core part of the observability team there, and we were lucky to happen to solve the problem, not just for Uber but for the broader community, especially the community adopting cloud-native architecture. And it just so happened that we were solving the problem of Uber in 2015, but the rest of the industry has similar problems today.So, it was almost the perfect opportunity to solve this now for a broader range of companies out there. And we already had a lot of the core technology built-in open-source as well. So, it was more of an opportunity rather than a long-term plan or anything of that sort, Corey.Corey: So, before we dive into the intricacies of what you’ve built, I always like to ask people this question because it turns out that the only thing that everyone agrees on is that everyone else is wrong. What is the dividing line, if any, between monitoring and observability?Martin: That’s a great question, and I don’t know if there’s an easy answer.Corey: I mean, my cynical approach is that, “Well, if you call it monitoring, you don’t get to bring in SRE-style salaries. Call it observability and no one knows what the hell we’re talking about, so sure, it’s a blank check at that point.” It’s cynical, and probably not entirely correct. So, I’m curious to get your take on it.Martin: Yeah, for sure. So, you know, there’s definitely a lot of overlap there, and there’s not really two separate things. In my mind at least, monitoring, which has been around for a very long time, has always been around notification and having visibility into your systems. And then as the system’s got more complex over time, being able to understand that and not just have visibility into it but understand it a little bit more required, perhaps, additional new data types to go and solve those problems. And that’s how, in my mind, monitoring sort of morphed into observability. So, perhaps one is a subset of the other, and they’re not competing concepts there. But at least that’s my opinion. I’m sure there are plenty out there that would, perhaps, disagree with that.Corey: On some level, it almost hits to the adage of, past a certain point of scale with distributed systems, it’s never a question of is the app up or down, it’s more a question of how down is it? At least that’s how it was explained to me at one point, and it was someone who was incredibly convincing, so I smiled and nodded and never really thought to question it any deeper than that. But I look back at the large-scale environments I’ve been in, and yeah, things are always on fire, on some level, and ideally, there are ways to handle and mitigate that. Past a certain point, the approach of small-scale systems stops working at large scale. I mean, I see that over in the costing world where people will put tools up on GitHub of, “Hey, I ran this script, and it works super well on my 10 instances.”And then you try and run the thing on 10,000 instances, and the thing melts into the floor, hits rate limits left and right because people don’t think in terms of those scales. So, it seems like you’re sort of going from the opposite end. Well, this is how we know things work at large scale; let’s go ahead and build that out as an initially smaller team. Because I’m going to assume, not knowing much about Chronosphere yet, that it’s the sort of thing that will help a company before they get to the hyperscaler stage.Martin: A hundred percent, and you’re spot on there, Corey. And it’s not even just a company going from small-stage, small-scale simple systems to more complicated ones, actually, if you think about this shift in the cloud right now, it’s really going from cloud to cloud-native. So, going from VMs to container on the infrastructure tier, and going from monoliths to microservices. So, it’s not even the growth of the company, necessarily, or the growth of the load that the system has to handle, but this shift to containers and microservices heavily accelerates the growth of the amount of data that gets produced, and that is causing a lot of these problems.Corey: So, Uber was famous for disrupting, effectively, the taxi market. What made you folks decide, “I know. We’re going to reinvent observability slash monitoring while we’re at it, too.” What was it about existing approaches that fell down and, I guess, necessitated you folks to build your own?Martin: Yeah, great question, Corey. And actually, it goes to the first part; we were disrupting the taxi industry, and I think the ability for Uber to iterate extremely fast and respond as a business to changing market conditions was key to that disruption. So, monitoring and observability was a key part of that because you can imagine it was providing all of the real-time visibility to not only what was happening in our infrastructure and applications, but the business as well. So, it really came out of a necessity more than anything else. We found that in order to be more competitive, we had to adopt what is probably today known as cloud-native architecture, adopt running on containers and microservices so that we can move faster, and along with that, we found that all of the existing monitoring tools we were using, weren’t really built for this type of environment. And it was that that was the forcing function for us to create our own technologies that were really purpose-built for this modern type of environment that gave us the visibility we needed to, to be competitive as a company and a business.Corey: So, talk to me a little bit more about what observability is. I hear people talking about it in terms of having three pillars; I hear people talking about it, to be frank, in a bunch of ways so that they’re trying to, I guess, appropriate the term to cover what they already are doing or selling because changing vocabulary is easier than changing an entire product philosophy. What is it?Martin: Yeah, we actually had a very similar view on observability, and originally we thought that it is a combination of metrics, logs, and traces, and that’s a very common view. You have the three pillars, it’s almost like three checkboxes; you tick them off, and you have, quote-unquote, “Observability.” And that’s actually how we looked at the problem at Uber, and we built solutions for each one of those and we checked all three boxes. What we’ve come to realize since then is perhaps that was not the best way to look at it because we had all three, but what we realized is that actually just having all three doesn’t really help you with the ultimate goal of what you want from this platform, and having more of each of the types of data didn’t really help us with that, either. So, taking a step back from there and when we really looked at it, the lesson that we learned in our view on observability is really more from an end-user perspective, rather than a data type or data input perspective.And really, from an end-user perspective, if you think about why you want to use your monitoring tool or your observability tool, you really want to be notified of issues and remediate them as quickly as possible. And to do that, it really just comes down to answering three questions. “Can I get notified when something is wrong? Yes or no? Do I even know something is wrong?”The second question is, “Can I triage it quickly to know what the impact is? Do I know if it’s impacting all of my customers or just a subset of them, and how bad is the issue? Can I go back to sleep if I’m being paged at two o’clock in the morning?”And the third one is, “Can I figure out the underlying root cause to the problem and go and actually fix it?” So, this is how we think about the problem now, is from the end-user perspective. And it’s not that you don’t need metrics, logs, or distributed traces to solve the problem, but we are now orienting our solution around solving the problem for the end-user, as opposed to just orienting our solution around the three data types, per se.Corey: I’m going to self-admit to a fun billing experience I had once with a different monitoring vendor whom I will not name because it turns out, you can tell stories, you can name names, but doing both gets you in trouble. It was a more traditional approach in a simpler time, and they wound up sending me a message saying, “Oh, we’re hitting rate limits on CloudWatch. Go ahead and open a ticket asking for them to raise it.” And in a rare display of foresight, AWS respond to my ticket with a, “We can do this, but understand at this level of concurrency, it will cost something like $90,000 a month on increased charges, with that frequency, for that many metrics.” And that was roughly twice what our AWS bill was in those days, and, “Oh.” So, I’m curious as to how you can offer predictable pricing when you can have things that emit so much data so quickly. I believe you when you say you can do it; I’m just trying to understand the philosophy of how that works.Martin: As I said earlier, we started to approach this by trying to solve it in a very engineering fashion where we just wanted to create more efficient backend technology so that it would be cheaper for the increased amount of data. What we realized over time is that no matter how much cheaper we make it, the amount of data being produced, especially from monitoring and observability, kept increasing, and not even in a linear fashion but in an exponential fashion. And because of that, it really switched the problem not to how efficiently can we store this, it really changed our focus of the problem to how our users using this data, and do they even understand the data that’s being produced? So, in addition to the couple of properties I mentioned earlier, around cost accounting and rate-limiting—those are definitely required—the other things we try to make available for our end-users is introspection tools such that they understand the type of data that’s being produced. It’s actually very easy in the monitoring and observability world to write a single line of code that actually produces a lot of data, and most developers don’t understand that that single line of code produces so much data.So, our approach to this is to provide a tool so that developers can introspect and understand what is produced on the backend side, not what is being inputted from their code, and then not only have an understanding of that but also dynamic ways to deal with it. So that again, when they hit the rate limit, they don’t just have to monitor it less, they understand that, “Oh, I inserted this particular label and now I have 20 times the amount of data that I needed before. Do I really need that particular label in there> and if not, perhaps dropping it dynamically on the server-side is a much better way of dealing with that problem than having to roll back your code and change your metric instrumentation.” So, for us, the way to deal with it is not to just make the backend even more efficient, but really to have end-users understand the data that they’re producing, and make decisions on which parts of it is really useful and which parts of it do they, perhaps not want or perhaps want to retain for shorter periods of time, for example, and then allow them to actually implement those changes on that data on the backend. And that is really how the end-users control the bills and the cost themselves.Corey: So, there are a number of different companies in the observability space that have different approaches to what they solve for. In some cases, to be very honest, it seems like, well, I have 15 different observability and monitoring tools. Which ones do you replace? And the answer is, “Oh, we’re number 16.” And it’s easy to be cynical and down on that entire approach, but then you start digging into it and they’re actually right.I didn’t expect that to be the case. What was your perspective that made you look around the, let’s be honest, fairly crowded landscape of observability companys’ tools that gave insight into the health status and well being of various applications in different ways, and say, “You know, no one’s quite gotten this right, yet. I have a better idea.”Martin: Yeah, you’re completely correct, and perhaps the previous environments that everybody was operating in, there were a lot of different tools for different purposes. A company would purchase an infrastructure monitoring tool, or perhaps even a network monitoring tool, and then they would have, perhaps, an APM solution for the applications, and then perhaps BI tools for the business. So, there was always historically a collection of different tools to go and solve this problem. And I think, again, what has really happened recently with this shift to cloud-native recently is that the need for a lot of this data to be in a single tool has become more important than ever. So, you think about your microservices running on a single container today, if a single container dies in isolation without knowing, perhaps, which microservice was running on it doesn’t mean very much, and just having that visibility is not going to be enough, just like if you don’t know which business use case that microservice was serving, that’s not going to be very useful for you, either.So, with cloud-native architecture, there is more of a need to have all of this data and visibility in a single tool, which hasn’t historically happened. And also, none of the existing tools today—so if you think about both the existing APM solutions out there and the existing hosted solutions that exist in the world today, none of them were really built for a cloud-native environment because you can think about even the timing that these companies were created at, you know, back in early 2010s, Kubernetes and containers weren’t really a thing. So, a lot of these tools weren’t really built for the modern architecture that we see most companies shifting towards. So, the opportunity was really to build something for where we think the industry and everyone’s technology stack was going to be as opposed to where the technology stack has been in the past before. And that was really the opportunity there, and it just so happened that we had built a lot of these solutions for a similar type environment for Uber many years before. So, leveraging a lot of our lessons learned there put us in a good spot to build a new solution that we believe is fairly different from everything else that exists today in the market, and it’s going to be a good fit for companies moving forward.Corey: So, on your website, one of the things that you, I assume, put up there just to pick a fight—because if there’s one thing these people love, it’s fighting—is a use case is outgrowing Prometheus. The entire story behind Prometheus is, “Oh, it scales forever. It’s what the hyperscalers would use. This came out of the way that Google does things.” And everyone talks about Google as if it’s this mythical Valhalla place where everything is amazing and nothing ever goes wrong. I’ve seen the conference talks. And that’s great. What does outgrowing Prometheus look like?Martin: Yeah, that’s a great question, Corey. So, if you look at Prometheus—and it is the graduated and the recommended monitoring tool for cloud-native environments—if you look at it and the way it scales, actually, it’s a single binary solution, which is great because it’s really easy to get started. You deploy a single instance, and you have ingestion, storage, and visibility, and dashboarding, and alerting, all packaged together into one solution, and that’s definitely great. And it can scale by itself to a certain point and is definitely the recommended starting point, but as you really start to grow your business, increase your cluster sizes, increase the number of applications you have, actually isn’t a great fit for horizontal scale. So, by default, there isn’t really a high availability and horizontal scale built into Prometheus by default, and that’s why other projects in the CNCF, such as Cortex and Thanos were created to solve some of these problems.So, we looked at the problem in a similar fashion, and when we created M3, the open-source metrics platform that came out of Uber, it was also approaching it from this different perspective where we built it to be horizontally scalable, and highly reliable from the beginning, but yet, we don’t really want it to be a, let’s say, competing project with Prometheus. So, it is actually something that works in tandem with Prometheus, in the sense that it can ingest Prometheus metrics and you can issue Prometheus query language queries against it, and it will fulfill those. But it is really built for a more scalable environment. And I would say that once a company starts to grow and they run into some of these pain points and these pain points are surrounding how reliable a Prometheus instance is, how you can scale it up beyond just giving it more resources on the VM that it runs on, vertical scale runs out at a certain point. Those are some of the pain points that a lot of companies do run into and need to solve eventually. And there are various solutions out there, both in open-source and in the commercial world, that are designed to solve those pain points. M3 being one of the open-source ones and, of course, Chronosphere being one of the commercial ones.Corey: This episode is sponsored in part by Salesforce. Salesforce invites you to “Salesforce and AWS: Whats Ahead for Architects, Admins and Developers” on June 24th at 10AM, Pacific Time. Its a virtual event where you’ll get a first look at the latest innovations of the Salesforce and AWS partnership, and have an opportunity to have your questions answered. Plus you’ll get to enjoy an exclusive performance from Grammy Award winning artist The Roots! I think they’re talking about a band, not people with super user access to a system. Registration is free at salesforce.com/whatsahead.Corey: Now, you’ve also gone ahead and more or less dangled raw meat in front of a tiger in some respects here because one of the things that you wind up saying on your site of why people would go with Chronosphere is, “Ah, this doesn’t allow for bill spike overages as far as what the Chronosphere bill is.” And that’s awesome. I love predictable pricing. It’s sort of the antithesis of cloud bills. But there is the counterargument, too, which is with many approaches to monitoring, I don’t actually care what my monitoring vendor is going to charge me because they wind up costing me five times more, just in terms of CloudWatch charges. How does your billing work? And how do you avoid causing problems for me on the AWS side, or other cloud provider? I mean, again, GCP and Azure are not immune from this.Martin: So, if you look at the built-in solutions by the cloud providers, a lot of those metrics and monitoring you get from those like CloudWatch or Stackdriver, a lot of it you get included for free with your AWS bill already. It’s only if you want additional data and additional retention, do you choose to pay more there. So, I think a lot of companies do use those solutions for the default set of monitoring that they want, especially for the AWS services, but generally, a lot of companies have custom monitoring requirements outside of that in the application tier, or even more detailed monitoring in the infrastructure that is required, especially if you think about Kubernetes.Corey: Oh, yeah. And then I see people using CloudWatch as basically a monitoring, or metric, or log router, which at its price point, don’t do that. [laugh]. It doesn’t end well for anyone involved.Martin: A hundred percent. So, our solution and our approach is a little bit different. So, it doesn’t actually go through CloudWatch or any of these other inbuilt cloud-hosted solutions as a router because, to your point, there’s a lot of cost there as well. It actually goes and collects the data from the infrastructure tier or the applications. And what we have found is that not only does the bill for monitoring climb exponentially—and not just as you grow; especially as you shift towards cloud-native architecture—our very first take of solving that problem is to make the backend a lot more efficient than before so it just is cheaper overall.And we approached it that way at Uber, and we had great results there. So, when we created an—originally before M3, 8% of Uber’s infrastructure bill was spent on monitoring all the infrastructure and the application. And by the time we were done with M3, the cost was a little over 1%. So, the very first solution was just make it more efficient. And that worked for a while, but what we saw is that over time, this grew again.And there wasn’t any more efficiency, we could crank out of the backend storage system. There’s only so much optimization you can do to the compression algorithms in the backend and how much you can get there. So, what we realized the problem shifted towards was not, can we store this data more efficiently because we’re already reaching limitations there, and what we noticed was more towards getting the users of this data—so individual developers themselves—to start to understand what data is being produced, how they’re using it, whether it’s even useful, and then taking control from that perspective. And this is not a problem isolated to the SRE team or the observability team anymore; if you think about modern DevOps practices, every developer needs to take control of monitoring their own applications. So, this responsibility is really in the hands of the developers.And the way we approached this from a Chronosphere perspective is really in four steps. The first one is that we have cost accounting so that every developer, and every team, and the central observability team know how much data is being produced. Because it’s actually a hard thing to measure, especially in the monitoring world. It’s—Corey: Oh, yeah. Even AWS bills get this wrong. Like if you’re sending data between one availability zone to another in the same region, it charges a penny to leave an AZ and a penny to enter an AZ in that scenario. And the way that they reflect this on the bill is they double it. So, if you’re sending one gigabyte across AZ link in a month, you’ll see two gigabytes on the bill and that’s how it’s reflected. And that is just a glimpse of the monstrosity that is the AWS billing system. But yeah, exposing that to folks so they can understand how much data their application is spitting off? Forget it. That never happens.Martin: Right. Right. And it’s not even exposing it to the company as a whole, it’s to each use case, to each developer so they know how much data they are producing themselves. They know how much of the bill is being consumed. And then the second step in that is to put up bumper lanes to that so that once you hit the limit, you don’t just get a surprise bill at the end of the month.When each developer hits that limit, they rate-limit themselves and they only impact their own data; there is no impact to the other developers or to the other teams, or to the rest of the company. So, we found that those two were necessary initial steps, and then there were additional steps beyond that, to help deal with this problem.Corey: So, in order for this to work within a multi-day lag, in some cases, it’s a near certainty that you’re looking at what is happening and the expense that is being incurred in real-time, not waiting for it to pass its way through the AWS billing system and then do some tag attribution back.Martin: A hundred percent. It’s in real-time for the stream of data. And as I mentioned earlier, for the monitoring data we are collecting, it goes straight from the customer environment to our backend so we’re not waiting for it to be routed through the cloud providers because, rightly so, there is a multi-day or multi-hour delay there. So, as the data is coming straight to our backend, we are actively in real-time measuring that and cost accounting it to each individual team. And in real-time, if the usage goes above what is allocated, will actually limit that particular team or that particular developer, and prevent them by default from using more. And with that mechanism, you can imagine that’s how the bill is controlled and controlled in real-time.Corey: So, help me understand, on some level; is your architecture then agent-based? Is it a library that gets included in the application code itself? All of the above and more? Something else entirely? Or is this just such a ridiculous question that you can’t believe that no one has ever asked it before?Martin: No, it’s a great question, Corey, and would love to give some more insight there. So, it is an agent that runs in the customer environment because it does need to be something there that goes and collects all the data we’re interested in to send it to the backend. This agent is unlike a lot of APM agents out there where it does, sort of, introspection, things like that. We really believe in the power of the open-source community, and in particular, open-source standards like the Prometheus format for metrics. So, what this agent does is it actually goes and discovers Prometheus endpoints exposed by the infrastructure and applications, and scrapes those endpoints to collect the monitoring data to send to the backend.And that is the only piece of software that runs in our customer environments. And then from that point on, all of the data is in our backend, and that’s where we go and process it and get visibility into the end-users as well as store it and make it available for alerting and dashboarding purposes as well.Corey: So, when did you found Chronosphere? I know that you folks recently raised a Series B—congratulations on that, by the way; that generally means, at least if I understand the VC world correctly, that you’ve established product-market fit and now we’re talking about let’s scale this thing. My experience in startup land was, “Oh, we’ve raised a Series B, that means it’s probably time to bring in the first DevOps hire.” And that was invariably me, and I wound up screaming and freaking out for three months, and then things were better. So, that was my exposure to Series B.But it seems like, given what you do, you probably had a few SRE folks kicking around, even on the product team because everything you’re saying so far absolutely resonates with the experiences someone who has run these large-scale things in production. No big surprise there. Is that where you are? I mean, how long have you been around?Martin: Yeah, so we’ve been around for a couple of years thus far—so still a relatively new company, for sure. A lot of the core team were the team that both built the underlying technology and also ran it in production the many years at Uber, and that team is now here at Chronosphere. So, you can imagine from the very beginning, we had DevOps and SREs running this hosted platform for us. And it’s the folks that actually built the technology and ran it for years running it again, outside of Uber now. And then to your first question, yes, we did establish fairly early on, and I think that is also because we could leverage a lot of the technology that we had built at Uber, and it sort of gave us a boost to have a product ready for the market much faster.And what we’re seeing in the industry right now is the adoption of cloud-native is so fast that it’s sort of accelerating a need of a new monitoring solution that historical solutions, perhaps, cannot handle a lot of the use cases there. It’s a new architecture, it’s a new technology stack, and we have the solution purpose-built for that particular stack. So, we are seeing fairly fast acceleration and adoption of our product right now.Corey: One problem that an awful lot of monitoring slash observability companies have gotten into in the last few years—at least it feels this way, and maybe I’m wildly incorrect—is that it seems that the target market is the Ubers of the world, the hyperscalers where once you’re at that scale, then you need a tool like this, but if you’re just building a standard three-tier web app, oh, you’re nowhere near that level of scale. And the problem with go-to-market in those stories inherently seems that by the time you are a hyperscalers, you have already built a somewhat significant observability apparatus, otherwise you would not have survived or stayed up long enough to become a hyperscalers. How do you find that the on-ramp looks? I mean, your website does talk about, “When you outgrow Prometheus.” Is there a certain point of scale that customers should be at before they start looking at things like Chronosphere?Martin: I think if you think about the companies that are born in the cloud today and how quickly they are running and they are iterating their technology stack, monitoring is so critical to that. It’s the real-time visibility of these changes that are going out multiple times a day is critical to the success and growth of a lot of new companies. And because of how critical that piece is, we’re finding that you don’t have to be a giant hyperscalers like Uber to need technology like this. And as you rightly pointed out, you need technology like this as you scale up. And what we’re finding is that while a lot of large tech companies can invest a lot of resources into hiring these teams and building out custom software themselves, generally, it’s not a great investment on their behalf because those are not companies that are selling monitoring technology as their core business.So generally, what we find is that it is better for companies to perhaps outsource or purchase, or at least use open-source solutions to solve some of these problems rather than custom-build in-house. And we’re finding that earlier and earlier on in a company’s lifecycle, they’re needing technology like this.Corey: Part of the problem I always ran into was—again, I come from the old world of grumpy Unix sysadmins—for me, using Nagios was my approach to monitoring. And that’s great when you have a persistent stateful, single node or a couple of single nodes. And then you outgrow it because well, now everything’s ephemeral and by the time you realize that there’s an outage or an issue with a container, the container hasn’t existed for 20 minutes. And you better have good telemetry into what’s going on and how your application behaves, especially at scale because at that point, edge cases, one-in-a-million events happen multiple times a second, depending upon scale, and that’s a different way of thinking. I’ve been somewhat fortunate in that, in my experience at least, I’ve not usually had to go through those transformative leaps.I’ve worked with Prometheus, I’ve worked with Nagios, but never in the same shop. That’s the joy of being a consultant. You go into one environment, you see what they’re doing and you take notes on what works and what doesn’t, you move on to the next one. And it’s clear that there’s a definite defined benefit to approaching observability in a more modern way. But I despair the idea of trying to go from one to the other. And maybe that just speaks to a lack of vision for me.Martin: No, I don’t think that’s the case at all, Corey. I think we are seeing a lot of companies do this transition. I don’t think a lot of companies go and ditch everything that they’ve done. And things that they put years of investment into, there’s definitely a gradual migration process here. And what we’re seeing is that a lot of the newer projects, newer environments, newer efforts that have been kicked off are being monitored and observed using modern technology like Prometheus.And then there’s also a lot of legacy systems which are still going to be around and legacy processes which are still going to be around for a very long time. It’s actually something we had to deal with that at Uber as well; we were actually using Nagios and a StatsD Graphite stack for a very long time before switching over to a more modern tag-like system like Prometheus. So—Corey: Oh, modern Nagios. What was it, uh… that’s right, Icinga. That’s what it was.Martin: Yes, yes. It was actually the system that we were using Uber. And I think for us, it’s not just about ditching all of that investment; it’s really about supporting this migration as well. And this is why both in the open-source technology M3, we actually support both the more legacy data types, like StatsD and the Graphite query language, as well as the more modern types like Prometheus and PromQL. And having support for both allows for a migration and a transition.And not even a complete transition; I’m sure there will always be StatsD, Graphite data in a lot of these companies because they’re just legacy applications that nobody owns or touches anymore, and they’re just going to be lying around for a long time. So, it’s actually something that we proactively get ahead of and ensure that we can support both use cases even though we see a lot of companies and trending towards the modern technology solutions, for sure.Corey: The last point I want to raise has always been a personal, I guess, area of focus for me. I allude to it, sometimes; I’ve done a Twitter thread or two on it, but on your website, you say something that completely resonates with my entire philosophy, and to be blunt is why in many cases, I’m down on an awful lot of vendor tooling across a wide variety of disciplines. On the open-source page on your site, near the bottom, you say, and I quote, “We want our end-users to build transferable skills that are not vendor or product-specific.” And I don’t think I’ve ever seen a vendor come out and say something like that. Where did that come from?Martin: Yeah. If you look at the core of the company, it is built on top of open-source technology. So, it is a very open core company here at Chronosphere, and we really believe in the power of the open-source community and in particular, perhaps not even individual projects, but industry standards and open standards. So, this is why we don’t have a proprietary protocol, or proprietary agent, or proprietary query language in our product because we truly believe in allowing our end-users to build these transferable skills and industry-standard skills. And right now that is using Prometheus as the client library for monitoring and PromQL as the query language.And I think it’s not just a transferable skill that you can bring with you across multiple companies, it is also the power of that broader community. So, you can imagine now that there is a lot more sharing of, “Hey, I am monitoring, for example, MongoDB. How should I best do that?” Those skills can be shared because the common language that they’re all speaking, the queries that everybody is sharing with each other, the dashboards everybody is sharing with each other, are all, sort of, open-source standards now. And we really believe in the power that and we really do everything we can to promote that. And that is why in our product, there isn’t any proprietary query language, or definitions of dashboarding, or [learning 00:35:39] or anything like that. So yeah, it is definitely just a core tenant of the company, I would say.Corey: It’s really something that I think is admirable, I’ve known too many people who wind up, I guess, stuck in various environments where the thing that they work on is an internal application to the company, and nothing else like it exists anywhere else, so if they ever want to change jobs, they effectively have a black hole on their resume for a number of years. This speaks directly to the opposite. It seems like it’s not built on a lock-in story; it’s built around actually solving problems. And I’m a little ashamed to say how refreshing that is [laugh] just based upon what that says about our industry.Martin: Yeah, Corey. And I think what we’re seeing is actually the power of these open-source standards, let’s say. Prometheus is actually having effects on the broader industry, which I think is great for everybody. So, while a company like Chronosphere is supporting these from day one, you see how pervasive the Prometheus protocol and the query language are that actually all of these probably more traditional vendors providing proprietary protocols and proprietary query languages all actually have to have Prometheus—or not ‘have to have,’ but we’re seeing that more and more of them are having Prometheus compatibility as well. And I think that just speaks to the power of the industry, and it really benefits all of the end-users and the industry as a whole, as opposed to the vendors, which we are really happy to be supporters of.Corey: Thank you so much for taking the time to speak with me today. If people want to learn more about what you’re up to, how you’re thinking about these things, where can they find you? And I’m going to go out on a limb and assume you’re also hiring.Martin: We’re definitely hiring right now. And you can find us on our website at chronosphere.io or feel free to shoot me an email directly. My email is martin@chronosphere.io. Definitely massively hiring right now, and also, if you do have problems trying to monitor your cloud-native environment, please come check out our website and our product.Corey: And we will, of course, include links to that in the [show notes 00:37:41]. Thank you so much for taking the time to speak with me today. I really appreciate it.Martin: Thanks a lot for having me, Corey. I really enjoyed this.Corey: Martin Mao, CEO and co-founder of Chronosphere. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment speculating about how long it took to convince Martin not to name the company ‘Observability Manager Chronosphere Manager.’Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About AJAJ Yawn is a seasoned cloud security professional that possesses over a decade of senior information security experience with extensive experience managing a wide range of cybersecurity compliance assessments (SOC 2, ISO 27001, HIPAA, etc.) for a variety of SaaS, IaaS, and PaaS providers.AJ advises startups on cloud security and serves on the Board of Directors of the ISC2 Miami chapter as the Education Chair, he is also a Founding Board member of the National Association of Black Compliance and Risk Management professions, regularly speaks on information security podcasts, events, and he contributes blogs and articles to the information security community including publications such as CISOMag, InfosecMag, HackerNoon, and ISC2.Before Bytechek, AJ served as a senior member of national cybersecurity professional services firm SOC-ISO-Healthcare compliance practice. AJ helped grow the practice from a 9 person team to over 100 team members serving clients all over the world. AJ also spent over five years on active duty in the United States Army, earning the rank of Captain.AJ is relentlessly committed to learning and encouraging others around him to improve themselves. He leads by example and has earned several industry-recognized certifications, including the AWS Certified Solutions Architect-Professional, CISSP, AWS Certified Security Specialty, AWS Certified Solutions Architect-Associate, and PMP. AJ is also involved with the AWS training and certification department, volunteering with the AWS Certification Examination Subject Matter Expert program.AJ graduated from Georgetown University with a Master of Science in Technology Management and from Florida State University with a Bachelor of Science in Social Science. While at Florida State, AJ played on the Florida State University Men's basketball team participating in back to back trips to the NCAA tournament playing under Coach Leonard Hamilton.Links: ByteChek: https://www.bytechek.com/ Blog post, Everything You Need to Know About SOC 2 Trust Service Criteria CC6.0 (Logical and Physical Access Controls): https://help.bytechek.com/en/articles/4567289-everything-you-need-to-know-about-soc-2-trust-service-criteria-cc6-0-logical-and-physical-access-controls LinkedIn: https://www.linkedin.com/in/ajyawn/ Twitter: https://twitter.com/AjYawn TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by AJ Yawn, co-founder, and CEO of ByteChek. AJ, thanks for joining me.AJ: Thanks for having me on, Corey. Really excited about the conversation.Corey: So, what is ByteChek? It sounds like it’s one of those things—‘byte’ spelled as in computer term, not teeth, and ‘chek’ without a second C in it because frugality looms everywhere, and we save money where we can by sometimes not buying the extra letter or vowel. So, what is ByteChek?AJ: Exactly. You get it. ByteChek is a cybersecurity compliance software company, built with one goal in mind: make compliance suck less. And the way that we do that is by automating the worst part of compliance, which is evidence collection and taking out a lot of the subjective nature of dealing with an audit by connecting directly where the evidence lives and focusing on security.Corey: That sound you hear is Pandora’s Box creaking open because back before I started focusing on AWS bills, I spent a few months doing a deep dive PCI project for workloads going into AWS because previously I’ve worked in regulated industries a fair bit. I’ve been a SOC 2 control owner, I’ve gone through the PCI process multiple times, I’ve dabbled with HIPAA as a consultant. And I thought, “Huh, there might be a business need here.” And it turns out, yeah, there really is.The problem for me is that the work made me want to die. I found it depressing; it was dull; it was a whole lot of hurry up and wait. And that didn’t align with how I approach the world, so I immediately got the hell out of there. You apparently have a better perspective on, you know, delivering things companies need and don’t need to have constant novel entertainment every 30 seconds. So, how did you start down this path, and what set you on this road?AJ: Yeah, great question. I started in the army as a information security officer, worked in a variety of different capacities. And when I left the military—mainly because I didn’t like sleeping outside anymore—I got into cybersecurity compliance consulting. And that’s where I got first into compliance and seeing the backwards way that we would do things with old document requests and screenshots. And I enjoyed the process because there was a reason for it, like you said.There’s a business value to this, going through this compliance assessments. So, I knew they were important, but I hated the way we were doing it. And while there, I just got exposed to so many companies that had to go through this, and I just thought there was a better way. Like, typical entrepreneur story, right? You see a problem and you’re like, “There has to be a better way than grabbing screenshots of the EC2 console.” And set out to build a product to do that, to just solve that problem that I saw on a regular basis. And I tell people all the time, I was complicit in making compliance stuff before. I was in that role and doing the things that I think sucked and not focused on security. And that’s what we’re solving here at ByteChek.Corey: So, I’ve dabbled in it and sort of recoiled in horror. You’ve gone into this to the point where you are not only handling it for customers but in order to build software that goes in a positive direction, you have to be deeply steeped in this yourself. As you’re going down this process, what was your build process like? Were you talking to auditors? Were you talking to companies who had to deal with auditors? What aspects of the problem did you approach this from?AJ: It’s really both aspects. And that’s where I think it’s just a really unique perspective I have because I’ve talked with a lot of auditors; I was an auditor and worked with auditors’ hand-in-hand and I understood the challenges of being an auditor, and the speed that you have to move when you’re in the consulting industry. But I also talked to a lot of customers because those were the people I dealt with on a regular basis, both from a sales perspective and from, you know, sitting there with the CTOs trying to figure out how to design a secure solution in AWS. So, I took it from the approach of you can’t automate compliance; you can’t fix the audit problem by only focusing on one side of the table, which is what currently happens where one side of the table is the client, then you get to automate evidence collection. But if the auditors can’t use that information that you’ve automated, then it’s still a bad process for both people. So, I took the approach of thinking about this from both, “How do I make this easier for auditors but also make it easier for the clients that are forced to undergo these audits?”Corey: From a lot of perspectives, having compliance achieved, regardless of whether it’s PCI, whether it’s HIPAA, whether it’s SOC 2, et cetera, et cetera, et cetera, the reason that a companies go through it is that it’s an attestation that they are, for better or worse, doing the right things. In some cases, it’s a requirement to operate in a regulated industry. In other cases, it’s required to process credit card transactions, which is kind of every industry, and in still others, it’s an easy shorthand way of saying that we’re not complete rank amateurs at these things, so as a result, we’re going to just pass over the result of our most recent SOC 2 audit to our prospective client, and suddenly, their security folks can relax and not send over weeks of questionnaires on the security front. That means that, for some folks, this is more or less a box-checking exercise rather than an actual good-faith effort to improve processes and posture.AJ: Correct. And I think that’s actually the problem with compliance is it’s looked at as a check-the-box exercise, and that’s why there’s no security value out of it. That’s why you can pick up a SOC 2 report for someone that’s hosted on AWS, and you don’t see any mention of S3 buckets. You can do a ctrl+F, and you literally don’t see anything in a security evaluation about S3 buckets, which is just insane if you know anything about security on AWS. And I think it’s because of what you just described, Corey; they’re often asked to do this by a regulator, or by a customer, or by a vendor, and the result is, “Hurry up and get this report so that we can close this deal,”—or we can get to the next level with this customer, or with this investor, whatever it may be—instead of, let’s go through this, let’s have an auditor come in and look at our environment to improve it, to improve this security, which is where I hope the industry can get to because audits aren’t going anywhere; people are going to continue to do them and spend thousands of dollars on them, so there should be some security value out of them, in my opinion.Corey: I love using encrypting data at rest as an example of things that make varying amounts of sense because, sure, on your company laptops, if someone steals an employee’s laptop from a coffee shop, or from the back of their car one night, yeah, you kind of want the exposure to the company to be limited to replacing the hardware. I mean, even here at The Duckbill Group, where we are not regulated, we’ve gone through no formal audits, we do have controls in place to ensure that all company laptops have disk encryption turned on. It makes sense from that perspective. And in the data center, it was also important because there were a few notable heists where someone either improperly disposed drives and corporate data wound up on eBay or someone in one notable instance drove a truck through the side of the data center wall, pulled a rack into the bed of the truck and took off, which is kind of impressive [laugh] no matter how you slice it. But in the context of a hyperscale cloud provider like AWS, you’re not going to be able to break into their data centers, steal a drive—and of course, it has to be the right collection of drives and the right machines—and then find out how to wind up reassembling that data later.It’s just not a viable attack strategy. Now, you can spend days arguing with auditors around something like that, or you can check the box ‘encrypt at rest’ and move on. And very often, that is the better path. I’m not going to argue with auditors about that. I’m going to bend the knee, check the box, and get back to doing the business thing that I care about. That is a reasonable approach, is it not?AJ: It is, but I think that’s the fault of the auditor because good security requires context. You can’t just apply a standard set of controls to every organization, as you’re describing, where I would much rather the auditor care about, “Are there any public S3 buckets? What are the security group situation like on that account? How are they managing their users? How are they storing credentials there in the cloud environment as well?Are they using multiple accounts?” So, many other things to care about other than protecting whether or not someone will be able to pull off the heist of the [laugh] 21st century. So, I think from a customer perspective, it’s the right model: don’t waste time arguing points with your auditors, but on the flip side, find an auditor that has more technical knowledge that can understand context, because security work requires good context and audits require context. And that’s the problem with audits now; we’re using one framework or several frameworks to apply to every organization. And I’ve been in the consulting space, like you, Corey, for a while. I have not seen the same environment in any customers. Every customer is different. Every customer has a different setup, so it doesn’t make sense to say every control should apply to every company.Corey: And it feels on some level like you wind up getting staff accustomed to treating it as a box-checking exercise. “Right, it’s dumb that we wind up having to encrypt S3 buckets, but it’s for the audit to just check the box and move on.” So, people do it, then they move on to the next item, which is, “Okay, great. Are there any public S3 buckets?” And they treat it with the same, “Yeah, whatever. It’s for the audit,” box-checking approach? No, no, that one’s actually serious. You should invest significant effort and time into making sure that it’s right.AJ: Exactly. Exactly. And that’s where the value of a true compliance assessment that is focused on security comes into play because it’s no longer about checking the box, it’s like, “Hey, there’s a weakness here. A weakness that you probably should have identified. So, let’s go fix the weakness, but let’s talk about your process to find those weaknesses and then hopefully use some automation to remediate them.”Because a lot of the issues in the cloud you can trace back to why was there not a control in place to prevent this or detect this? And it’s sad that compliance assessments are not the thing that can catch those, that are not the other safeguard in place to identify those. And it’s because we are treating the entire thing like a check-the-box exercise and not pulling out those items that really matter, and that’s just focusing on security. Which is ultimately what these compliance reports are proving: customers are asking for these reports because they want to know if their data is going to be secure. And that’s what the report is supposed to do, but on the flip side, everyone knows the organization may not be taking it that serious, and they may be treating it like a check-the-box exercise.Corey: So, while I have you here, we’ll divert for a minute because I’m legitimately curious about this one. At a scale of legitimate security concern to, “This is a check-the-box exercise,” where do things like rotating passwords every 60 days or rotating IAM credentials every 90 days fall?AJ: I think it again depends on the organization. I don’t think that you need to rotate passwords regularly, personally. I don’t know how strong of a control that is if people are doing that, because they’re just going to start to make things up that are easy—Corey: Put the number at the end and increment by one every time. Great. Good work.AJ: Yep. So, I think again, it just depends on your organization and what the organization is doing. If you’re talking about managing IAM access keys and rotating those, are your engineers even using the CLI? Are they using their access keys? Because if they’re not, what are you rotating?You’re just rotating [laugh] stale keys that have never been used. Or if you don’t even have any IAM users, maybe you’re using SSO and they’re all using Okta or something else and they’re using an IAM role to come in there. So, it’s just—again, it’s context. And I think the problem is, a lot of folks don’t understand AWS or they don’t understand the cloud. And when I say, folks, I mean auditors.They don’t understand that, so they’re just going to ask for everything. “Did you rotate your passwords? Did you do this? Did you do that?” And it may not even make sense for you based off of your environment, but again, is it worth the fight with the auditor, or do you just give them whatever they want and so you can go about your way, whether or not it’s a legit security concern?Corey: Yeah. At some point, it’s not worth fighting with auditors, but if you find yourself wanting to fight the auditor all the time, at some level, you start to really resent the auditor that you have. To put that slightly more succinctly, how do you deal with non-technical auditors who don’t understand your environment—what they’re looking at—without strangling them?AJ: Great question. I think it goes back to before you hire your auditor. Oftentimes, in the sales process, there’s questions around, “Who’s come from the Big Four on your staff?” Or, “What control frameworks do you all specialize in?” Or, “How long will this take? How much will it cost?” But there’s very rarely any questions of, “Who on your staff knows AWS?”And it’s similar to going to the doctor: you wouldn't go to an eye doctor to get foot surgery. So, you shouldn’t go to an auditor who has never seen AWS, that doesn’t know what EC2 is, to evaluate your AWS environment. So, I think organizations have to start asking the right questions during the sales process. And it’s not about price or time or anything like that when you’re assessing who you’re going to work with from an auditing firm. It’s, are they qualified to actually evaluate the threats facing your organization so that you don’t get asked the stupid question.If you’re hosted on AWS, you shouldn’t be getting asked where are your firewall configurations. They should understand what security groups are and how they work. So, there’s just a level of knowledge that should be expected from the organization side. And I would say, if you’re working with a current auditor that you’re having those issues with, continue to ask the hard questions. Auditors that are not technical—I have a blog post on our website, and it says this is the section your auditors are the most scared of, and it’s the logical access section of your SOC 2 report.And auditors that are not technical run away from that section. So, just keep asking the hard questions, and they’ll either have to get the knowledge or they realize they’re not qualified to do the assessment and the marriage will split up kind of naturally from there. But I think it goes back to the initial process of getting your auditor. Don’t worry about cost or time, worry about their technical skills and if they’re qualified to assess your environment.Corey: And in 2021, that’s a very different story than it was the first few times I encountered auditors discovering the new era. At a startup, the auditor shows up. “Great, how do we get access to your Active Directory?” “Yeah, we don’t have one of those.” “Okay, how do we get on the internet here?” “Oh, here’s the wireless password.” “Wait, there’s not a separate guest network?” “That’s right.” “Well, now I have privileged access because I’m on your network.”It’s like, “Technically, that’s true because if you weren’t on this network, you wouldn’t be able to print to that printer over there in the corner. But that’s the only thing that it lets you do.” Everything else is identity-based, not IP address allow listing, so instead, it’s purely just convenience to get the internet; you’re about as privileged on this network as you would be at a Starbucks half a world away. And they look at you like you’re an idiot. And that should have been the early warning sign that this was not going to be a typical audit conversation. Now, though in 2021, it feels like it’s time to find a new auditor.AJ: Exactly. Yeah. Especially because organizations—unfortunately, last year security budgets were some of the things that were first cut when budgets were cut due to the global pandemic, S0—Corey: Well, I’m sure that’ll have no lasting repercussions.AJ: Right. [laugh]. That’s always a great decision. So compliance, that means compliance budgets have been significantly slashed because that’s the first thing that gets cut is spending money on compliance activities. So, the cheaper option, oftentimes, is going to mean even less technical resources.Which is why I don’t think manual audits, human audits are going to be a thing moving forward. I think companies are realizing that it doesn’t make sense to go through a process, hire an auditor who’s selling you on all this technical expertise, and then the staff that’s showing up and assigned to your project has never seen inside the AWS console and truly doesn’t even know what the cloud is. They think that iCloud on their phone is the only cloud that they’re familiar with. And that’s what happens; organizations are sold that they’re going to get cybersecurity technical experts from these human auditors and then somebody shows up without that experience or expertise. So, you have to start to rely on tools, rely on technologies, and that can be native technologies in the cloud or third-party tools.But I don’t think you can actually do a good audit in the cloud manually anyways, no matter how technical you are. I know a lot about AWS but I still couldn’t do a great audit by myself in the cloud because auditing is time-based, you bill by the hour and it doesn’t make sense for me to do all of those manual things that tools and technologies out there exist to do for us.Corey: So, you started a software company aimed at this problem, not a auditing firm and not a consulting company. How are you solving this via the magic of writing code?AJ: It’s just connecting directly where the evidence lives. So, for AWS, I actually tried to do this in a non-software way prior, when I was just a typical auditor, and I was just asking our clients to provision us cross-account access to go in their environment with some security permissions to get evidence directly. And that didn’t pass the sniff test at my consulting firm, even though some of the clients were open to it. But we built software to go out to the tools where the evidence directly lives and continuously assess the environment. So, that’s AWS, that’s GitHub, that Jira, that’s all of the different tools where you normally collect this evidence, and instead of having to prove to auditors in a very manual fashion, by grabbing screenshots, you just simply connect using APIs to get the evidence directly from the source, which is more technically accurate.The way that auditing has been done in the past is using sampling methodologies and all these other outdated things, but that doesn’t really assess if all of your data stores are configured in the right way; if you’re actually backing up your data. It’s me randomly picking one and saying, “Yes, you’re good to go.” So, we connect directly where the evidence lives and hopefully get to a point where when you get a SOC 2 report, you know that a tool checked it. So, you know that the tool went out and looked at every single data store, or they went out and looked at every single EC2 instance, or security group, whatever it may be, and it wasn’t dependent on how the auditor felt that day.Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.Corey: That sounds like it is almost too good to be true. And at first, my immediate response is, “This is amazing,” followed immediately by that’s transitioning into anger, that, “Why isn’t this a native thing that everyone offers?” I mean, to that end, AWS announced ‘Audit Manager’ recently, which I haven’t had the opportunity to dive into in any deep sense yet, because it’s still brand new, and they decided to release it alongside 15,000 other things, but does that start getting a little bit closer to something companies need? Or is it a typical day-one first release of an Amazon service where, “Well, at least we know the direction you’re heading in. We’ll check back in two years.”AJ: Exactly. It’s the day-one Amazon service release where, “Okay. AWS is getting into the audit space. That’s good to know.” But right now, at its core, that AWS service, it’s just not usable for audits, for several reasons.One, auditors cannot read the outputs of the information from Audit Manager. And it goes back to the earlier point where you can’t automate compliance, you can’t fix compliance if the auditors can’t use the information because then they’re going to go back to asking dumb questions and dumb evidence requests if they don’t understand the information coming out of it. And it’s just because of the output right now is a dump of JSON, essentially, in a Word document, for some strange reason.Corey: Okay, that is the perfect example right there of two worlds colliding. It’s like, “Well, we’re going to put JSON out of it because that’s the language developers speak. Well, what do auditors prefer?” “I don’t know, Microsoft Word?” “Okay, sounds good.” Even Microsoft Excel is a better answer than [laugh] that. And that is just… okay, that is just Looney Tunes awful.AJ: Yep. Yeah, exactly. And that’s one problem. The other problem is, Audit Manager requires a compliance manager. If we think about that tool, a developer is not going to use Audit Manager; it’s going to be somebody responsible for compliance.It requires them to go manually select every service that their company is using. A compliance manager, one, doesn’t even know what the services are; they have no clue what some of these services are, two, how are they going to know if you’re using Lambda randomly somewhere or, or a Systems Manager randomly somewhere, or Elastic Beanstalk’s in one account or one region. Config here, config—they have to just go through and manually—and I’m like, “Well, that doesn’t make any sense because AWS knows what services you’re using. Why not just already have those selected and you pull those in scope?” So, the chances of something being excluded are extremely high because it’s a really manual process for users to decide what are they actually assessing.And then lastly, the frameworks need a lot of work. Auditing is complex because their standards or regulations and all of that, and there’s just a gap between what AWS has listed as a service that addresses a particular control that—there was a few times where I looked at Audit Manager and I had no clue what they were mapping to and why they’re mapping. So, it’s a typical day-one service; it has some gaps, but I like the direction it’s going. I like the idea that an organization can go into their AWS console, hit to a dashboard, and say, “Am I meeting SOC 2?” Or“ am I meeting PCI?” I feel like this is a long time coming. I think you probably could have done it with Security Hub with less automation; you have to do some manual uploads there, but the long answer to say it has a long way to go there, Corey.Corey: I heard a couple of horror stories of, “Oh, my god, it’s charging me $300 a day and I can’t turn it off,” when it first launched. I assume that’s been fixed by now because the screaming has stopped. I have to assume it was. But it was gnarly and surprising people with bills. And surprising people with things labeled ‘audit’ is never a great plan.AJ: Right. Yeah, the pricing was a little ridiculous as well. And I didn’t really understand the pricing model. But that’s typical of a new AWS service, I never really understand. That’s why I’m glad that you exist because I’m always confused at first about why things cost so much, but then if you give it some time, it starts to make a little bit more sense.Corey: Exactly. The first time you see a new pricing dimension, it’s novel and exciting and more than a little scary, and you dive into it. But then it’s just pattern recognition. It’s, “Oh, it’s one of these things again. Great.” It’s why it lends itself to a consulting story.So, you were in the army for a while. And as you mentioned, you got tired of sleeping on the ground, so you went into corporate life. And you were at a national cybersecurity professional services firm for a while. What was it that finally made you, I guess, snap for lack of a better term and, “I’m going to start my own thing?” Because in my case, it was, “Well, okay. I get fired an awful lot. Maybe I should try setting out my own shingle because I really don’t have another great option.” I don’t get the sense, given your resume and pedigree, that that was your situation?AJ: Not quite. I surprisingly, don’t do well with authority. So, a little bit I like to challenge things and question the norm often, which got me in trouble in the military, definitely got me in trouble in corporate life. But for me it was, I wanted to change; I wanted to innovate. I just kept seeing that there was a problem with what we were doing and how we were doing it, and I didn’t feel like I had the ability to innovate.Innovating in a professional services firm is updating a Google Sheet, or adding a new Google Form and sending that off to a client. That’s not really the innovation that I was looking to do. And I realized that if I wanted to create something that was going to solve this problem, I could go join one of the many startups out there that are out there trying to solve this problem, or I could just try to go do it myself and leverage my experience. And two worlds collided as far as timing and opportunity where I financially was in a position to take a chance like this, and I had the knowledge that I finally think I needed to feel comfortable going out on my own and just made the decision. I’m a pretty decisive person, and I decided that I was going to do it and just went with it.And despite going about this during the global pandemic, which presented its own challenges last year, getting this off the ground. But it was really—I collected a bunch of knowledge. I realized, maybe, two and a half years ago, actually, that I wanted to start my own business in this space, but I didn’t know what I wanted to do just yet. I knew I wanted to do software, I didn’t know how I wanted to do it, I didn’t know how I was going to make it work. But I just decided to take my time and learn as much as I can.And once I felt like I acquired enough knowledge and there was really nothing else I could gain from not doing this on my own, and I knew I wasn’t going to go join a startup to join them on this journey, it was a no-brainer just to pull the trigger.Corey: It seems to have worked out for you. I’m starting to see you folks crop up from time-to-time, things seem to be going well. How big are you?AJ: Yeah, we’re doing well. We have a team of seven of us now, which is crazy to think about because I remember when it was just me and my co-founder staring at each other on Zoom every day and wondering if they’re ever going to be anybody else on these [laugh] calls and talking to us. But it’s going really well. We have early customers that are happy and that’s all that I can ask for and they’re not just happy silently; they’re being really public about being happy about the platform, and about the process. And just working with people that get it and we’re building a lot of momentum.I’m having a lot of fun on LinkedIn and doing a lot of marketing efforts there as well. So, it’s been going well; it’s been actually going better than expected, surprisingly, which I don’t know, I’m a pretty optimistic entrepreneur and I thought things will go well, but it’s much better than expected, which means I’m sleeping a lot less than I expected, as well.Corey: Yeah, at some point, when you find yourself on the startup train, it’s one of those, “Oh, yeah. That’s right. My health is in the gutter, my relationships are starting to implode around me.” Balance is key. And I think that that is something that we don’t talk about enough in this world.There are periodically horrible tweets about how you should wind up focusing on your company, it should be the all-consuming thing that drives you at all hours of the day. And you check and, “Oh, who made that observation on Twitter? Oh, it’s a VC.” And then you investigate the VC and huh, “You should only have one serious bet, it should be your all-consuming passion” says someone who’s invested in a wide variety of different companies all at the same time, in the hopes that one of them succeeds. Huh.Almost like this person isn’t taking the advice they’re giving themselves and is incentivized to give that advice to others. Huh, how about that? And I know that’s a cynical take, but it continues to annoy me when I see it. Where do you stand on the balance side of the equation?AJ: Yeah, I think balance is key. I work a lot, but I rest a lot too. And I spend—I really hold my mornings as my kind of sacred place, and I spend my mornings meditating, doing yoga, working out, and really just giving back to myself. And I encourage my team to do the same. And we don’t just encourage it from just a, “Hey, you guys should do this,” but I talk to my team a lot about not taking ourselves too seriously.It’s our number one core value. It’s why our slogan is ‘make compliance suck less’ because it’s really my military background. We’re not being shot at; we’re sleeping at home every night. And while compliance and cybersecurity, it’s really important, and we’re protecting really important things, it’s not that serious to go all-in and to not have balance, and not to take time off not to relax. I mean, a part of what we do at ByteChek is we have a 10% rule, which means 10% of the week, I encourage my team to spend it on themselves, whether that’s doing meditation, going to take a nap.And these are work hours; you know, go out, play golf. I spent my 10% this morning playing golf during work hours. And I encourage all my team, every single week, spend four hours dedicated to yourself because there’s nothing that we will be able to do as a company without the people here being correct and being mentally okay. And that’s something that I learned a long time ago in the military. You spend a year away from home and you start to really realize what’s important.And it’s not your job. And that’s the thing. We hire a lot of veterans here because of my veteran background, and I tell all the vets that come here when you’re in the military, your job, your rank, and your day-to-day work is your identity. It’s who you are. You’re a Marine or you’re a Soldier, or you’re a Sailor; you’re an Airman if that’s a bad choice that you made. Sorry for my Air Force guys.Corey: Well, now there’s a Spaceman story as well, I’m told. But I don’t know if they call them spacemen or not, but remember, there’s a new branch to consider. And we can’t forget the Coast Guard either.AJ: If they don’t call themselves Spacemen, that is their name from now on. We just made it, today. If I ever meet somebody in the Space Force, [laugh] I’m calling them the Spacemen. That is amazing. But I tell our interns that we bring from the military, you have to strip that away.You have to become an individual because ByteChek is not your identity. And it won’t be your identity. And ByteChek’s not my identity. It’s something that I’m doing, and I am optimistic that it’s going to work out and I really hope that it does. But if it doesn’t, I’m going to be all right; my team is going to be all right and we’re going to all continue to go on.And we just try to live that out every day because there’s so many more important things going on in this world other than cybersecurity compliance, so we really shouldn’t take ourselves too seriously. And that advice of just grinding it out, and that should be your only focus, that’s only a recipe for disaster, in my opinion.Corey: AJ, thank you so much for taking the time to speak with me. If people want to hear more about what you have to say, where can they find you?AJ: They can find me on LinkedIn. That’s my one spot that I’m currently on. I am going to pop on Twitter here pretty soon. I don’t know when, but probably in the next few weeks or so. I’ve been encouraged by a lot of folks to join the tech community on Twitter, so I’ll be there soon.But right now they can find me on LinkedIn. I give four hours back a week to mentoring, so if you hear this and you want to reach out, you want to chat with me, send me a message and I will send you a link to find time on my calendar to meet. I spend four hours every Friday mentoring, so I’m open to chat and help anyone. And when you see me on LinkedIn, you’ll see me talking about diversity in cybersecurity because I think really the only way you can solve a cybersecurity skills shortage is by hiring more diverse individuals. So, come find me there, engage with me, talk to me; I’m a very open person and I like to meet new people. And that’s where you can find me.Corey: Excellent. And we’ll of course throw a link to your LinkedIn profile in the [show notes 00:29:44]. Thank you so much for taking the time to speak with me. It's really appreciated.AJ: Yeah, definitely. Thank you, Corey. This is kind of like a dream come true to be on this podcast that I’ve listened to a lot and talk about something that I’m passionate about. So, thanks for the opportunity.Corey: AJ Yawn, CEO and co-founder of ByteChek. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment that’s embedded inside of a Word document.Announcer: This has been this week’s episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com, or wherever fine snark is sold.This has been a HumblePod production. Stay humble.
About MikeBeside his duties as The Duckbill Group’s CEO, Mike is the author of O’Reilly’s Practical Monitoring, and previously wrote the Monitoring Weekly newsletter and hosted the Real World DevOps podcast. He was previously a DevOps Engineer for companies such as Taos Consulting, Peak Hosting, Oak Ridge National Laboratory, and many more. Mike is originally from Knoxville, TN (Go Vols!) and currently resides in Portland, OR.Links: Software Engineering Daily podcast: https://softwareengineeringdaily.com/category/all-episodes/exclusive-content/Podcast/ Duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I spent the past week guest hosting the Software Engineering Daily podcast, taking listeners over there on a tour of the clouds. Each day, I picked a different cloud and had a guest talk to me about their experiences with that cloud.Now, there was one that we didn’t talk about, and we’re finishing up that tour here today on Screaming in the Cloud. That cloud is the obvious one, and that is your own crappy data center. And my guest is Duckbill Group’s CEO and my business partner, Mike Julian. Mike, thanks for joining me.Mike: Hi, Corey. Thanks for having me back.Corey: So, I frequently say that I started my career as a grumpy Unix sysadmin. Because it isn’t like there’s a second kind of Unix sysadmin you’re going to see. And you were in that same boat. You and I both have extensive experience working in data centers. And it’s easy sitting here on the tech coast of the United States—we’re each in tech hubs cities—and we look around and yeah, the customers we talked to have massive cloud presences; everything we do is in cloud, it’s easy to fall into the trap of believing that data centers are a thing of yesteryear. Are they?Mike: [laugh]. Absolutely not. I mean, our own customers have tons of stuff in data centers. There are still companies out there like Equinix, and CoreSite, and DRC—is that them? I forget the name of them.Corey: DRT. Digital Realty [unintelligible 00:01:54].Mike: Digital Realty. Yeah. These are companies still making money hand over fist. People are still putting new workloads into data centers, so yeah, we’re kind of stuck with him for a while.Corey: What’s fun is when I talked to my friends over in the data center sales part of the world, I have to admit, I went into those conversations early on with more than my own fair share of arrogance. And it was, “[laugh]. So, who are you selling to these days?” And the answer was, “Everyone, fool.” Because they are.People at large companies with existing data center footprints are not generally doing fire sales of their data centers, and one thing that we learned about cloud bills here at The Duckbill Group is that they only ever tend to go up with time. That’s going to be the case when we start talking about data centers as well. The difference there is that it’s not just an API call away to lease more space, put in some racks, buy some servers, get them racked. So, my question for you is, if we sit here and do the Hacker News—also known as the worst website on the internet—and take their first principles approach to everything, does that mean the people who are building out data centers are somehow doing it wrong? Did they miss a transformation somewhere?Mike: No, I don’t think they’re doing it wrong. I think there’s still a lot of value in having data centers and having that sort of skill set. I do think the future is in cloud infrastructure, though. And whether that’s a public cloud, or private cloud, or something like that, I think we’re getting increasingly away from building on top of bare metal, just because it’s so inefficient to do. So yeah, I think at some point—and I feel like we’ve been saying this for years that, “Oh, no, everyone’s missed the boat,” and here we are saying it yet again, like, “Oh, no. Everyone’s missing the boat.” You know, at some point, the boat’s going to frickin’ leave.Corey: From my perspective, there are advantages to data centers. And we can go through those to some degree, but let’s start at the beginning. Origin stories are always useful. What’s your experience working in data centers?Mike: [laugh]. Oh, boy. Most of my career has been in data centers. And in fact, one interesting tidbit is that, despite running a company that is built on AWS consulting, I didn’t start using AWS myself until 2015. So, as of this recording, it’s 2021 now, so that means six years ago is when I first started AWS.And before that, it was all in data centers. So, some of my most interesting stuff in the data center world was from Oak Ridge National Lab where we had hundreds of thousands of square feet of data center floor space across, like, three floors. And it was insane, just the amount of data center stuff going on there. A whole bunch of HPC, a whole bunch of just random racks of bullshit. So, it’s pretty interesting stuff.I think probably the most really interesting bit I’ve worked on was when I was at a now-defunct company, Peak Hosting, where we had to figure out how to spin up a data center without having anyone at the data center, as in, there was no one there to do the spin up. And that led into interesting problems, like you have multiple racks of equipment, like, thousands of servers just showed up on the loading dock. Someone’s got to rack them, but from that point, it all has to be automatic. So, how do you bootstrap entire racks of systems from nothing with no one physically there to start a bootstrap process? And that led us to build some just truly horrific stuff. And thank God that’s someone else’s problem, now. [laugh].Corey: It makes you wonder if under the hood at all these cloud providers if they have something that’s a lot cleaner, and more efficient, and perfect, or if it’s a whole bunch of Perl tied together with bash and hope, like we always built.Mike: You know what? I have to imagine that even at AWS at a—I know if this is true at Facebook, where they have a massive data center footprint as well—there is a lot of work that goes into the bootstrap process, and a lot of these companies are building their own hardware to facilitate making that bootstrap process easier. When you’re trying to bootstrap, say, like, Dell or HP servers, the management cards only take you so far. And a lot of the stuff that we had to do was working around bugs in the HP management cards, or the Dell DRACs.Corey: Or you can wind up going with some budget whitebox service. I mean, Supermicro is popular, not that they’re ultra-low budget. But yeah, you can effectively build your own. And that leads down interesting paths, too. I feel like there’s a sweet spot where working on a data center and doing a build-out makes sense for certain companies.If you’re trying to build out some proof of concept, yeah, do it in the cloud; you don’t have to wait eight weeks and spend thousands of dollars; you can prove it out right now and spend a total of something like 17 cents to figure out if it’s going to work or not. And if it does, then proceed from there, if not shut it down, and here’s a quarter; keep the change. With data centers, a lot more planning winds up being involved. And is there a cutover at which point it makes sense to evacuate from a public cloud into a physical data center?Mike: You know, I don’t really think so. This came up on a recent Twitter Spaces that you and I did around, at what point does it really make sense to be hybrid, or to be all-in on data center? I made the argument that a large-scale HPC does not fit cloud workloads, and someone made a comment that, like, “What is large-scale?” And to me, large-scale was always, like—so Oak Ridge was—or is famous—for having supercomputing, and they have largely been in the top five supercomputers in the world for quite some time. A supercomputer of that size is tens of thousands of cores. And they’re running pretty much constant because of how expensive that stuff is to get time on. And that sort of thing would be just astronomically expensive in a cloud. But how many of those are there really?Corey: Yeah, if you’re an AWS account manager listening to this and reaching out with, “No, that’s not true. After committed spend, we’ll wind up giving you significant discounts, and a whole bunch of credits, and jump through all these hoops.” And, yeah, I know, you’ll give me a bunch of short-term contractual stuff that’s bounded for a number of years, but there’s no guarantee that stuff gets renewed at that rate. And let’s face it. If you’re running those kinds of workloads today, and already have the staff and tooling and processes that embrace that, maybe ripping all that out in a cloud migration where there’s no clear business value derived isn’t the best plan.Mike: Right. So, while there is a lot of large-scale HPC infrastructure that I don’t think particularly fits well on the cloud, there’s not a lot of that. There’s just not that many massive HPC deployments out there. Which means that pretty much everything below that threshold could be a candidate for cloud workloads, and probably would be much better. One of the things that I noticed at Oak Ridge was that we had a whole bunch of SGI HPC systems laying around, and 90% of the time they were idle.And those things were not cheap when they were bought, and at the time, they’re basically worth nothing. But they were idle most of the time, but when they were needed, they’re there, and they do a great job of it. With AWS and GCP and Azure HPC offerings, that’s a pretty good fit. Just migrate that whole thing over because it’ll cost you less than buying a new one. But if I’m going to migrate Titan or Gaia from Oak Ridge over to there, yeah, some AWS rep is about to have a very nice field day. That’d just be too much money.Corey: Well, I’d be remiss as a cloud economist if I didn’t point out that you can do this stuff super efficiently in someone else’s AWS account.Mike: [laugh]. Yes.Corey: There’s also the staffing question where if you’re a large blue-chip company, you’ve been around for enough decades that you tend to have some revenue to risk, where you have existing processes and everything is existing in an on-prem environment, as much as we love to tell stories about the cloud being awesome, and the capability increase and the rest, yadda, yadda, yadda, there has to be a business case behind moving to the cloud, and it will knock some nebulous percentage off of your TCO—because lies, damned lies, and TCO analyses are sort of the way of the world—great. That’s not exciting to most strategic-level execs. At least as I see the world. Given you are one of those strategic level execs, do you agree? Am I lacking nuance here?Mike: No, I pretty much agree. Doing a data center migration, you got to have a reason to do it. We have a lot of clients that are still running in data centers as well, and they don’t move because the math doesn’t make sense. And even when you start factoring in all the gains from productivity that they might get—and I stress the word might here—even when you factor those in, even when you factor in all the support and credits that Amazon might give them, it still doesn’t make enough sense. So, they’re still in data centers because that’s where they should be for the time because that’s what the finances say. And I’m kind of hard-pressed to disagree with them.Corey: While we’re here playing ‘ask an exec,’ I’m going to go for another one here. It’s my belief that any cloud provider that charges a penny for professional services, or managed services, or any form of migration tooling or offering at all to their customers is missing the plot. Clearly, since they all tend to do this, I’m wrong somewhere. But I don’t see how am I wrong or are they?Mike: Yeah, I don’t know. I’d have to think about that one some more.Corey: It’s an interesting point because it’s—Mike: It is.Corey: —it’s easy to think of this as, “Oh, yeah. You should absolutely pay people to migrate in because the whole point of cloud is that it’s kind of sticky.” The biggest indicator of a big cloud bill this month is a slightly smaller one last month. And once people wind up migrating into a cloud, they tend not to leave despite all of their protestations to the contrary about multi-cloud, hybrid, et cetera, et cetera. And that becomes an interesting problem.It becomes an area—there’s a whole bunch of vendors that are very deeply niched into that. It’s clear that the industry as a whole thinks that migrating from data centers to cloud is going to be a boom industry for the next three decades. I don’t think they’re wrong.Mike: Yeah, I don’t think they’re wrong either. I think there’s a very long tail of companies with massive footprint staying in a data center that at some point is going to get out of a data center.Corey: For those listeners who are fortunate enough not to have to come up the way that we did. Can you describe what a data center is like inside?Mike: Oh, God.Corey: What is a data center? People have these mythic ideas from television and movies, and I don’t know, maybe some Backstreet Boys music video; I don’t know where it all comes from. What is a data center like? What does it do?Mike: I’ve been in many of these over my life, and I think they really fall into two groups. One is the one managed by a professional data center manager. And those tend to be sterile environments. Like, that’s the best way to describe it. They are white, filled with black racks. Everything is absolutely immaculate. There is no trash or other debris on the floor. Everything is just perfect. And it is freezingly cold.Corey: Oh, yeah. So, you’re in a data center for any length of time, bring a jacket. And the soulless part of it, too, is that it’s well-lit with fluorescent lights everywhere—Mike: Oh yeah.Corey: —and it’s never blinking, never changing. There are no windows. Time loses all meaning. And it’s strange to think about this because you don’t walk in and think, “What is that racket?” But there’s 10,000, 100,000 however many fans spinning all the time. It is super loud. It can clear 120 decibels in there, but it’s a white noise so you don’t necessarily hear it. Hearing protection is important there.Mike: When I was at Oak Ridge, we had—all of our data centers, we had a professional data center manager, so everything was absolutely pristine. And to get into any of the data centers, you had to go through a training; it was very simple training, but just, like, “These are things you do and don’t do in the data center.” And when you walked in, you had to put in earplugs immediately before you walked in the door. And it’s so loud just because of that, and you don’t really notice it because you can walk in without earplugs and, like, “Oh, it’s loud, but it’s fine.” And then you leave a couple hours later and your ears are ringing. So, it’s a weird experience.Corey: It’s awful. I started wearing earplugs every time I went in, just because it’s not just the pain because hearing loss doesn’t always manifest that way. It’s, I would get tired much more quickly.Mike: Oh, yeah.Corey: I would not be as sharp. It was, “What is this? Why am I so fatigued?” It’s noise.Mike: Yeah. And having to remember to grab your jacket when you head down to the data center, even though it’s 95 degrees outside.Corey: At some point, if you’re there enough—which you probably shouldn’t be—you start looking at ways to wind up storing one locally. I feel like there could be some company that makes an absolute killing by renting out parkas at data centers.Mike: Yeah, totally. The other group of data center stuff that I generally run into is the exact opposite of that. And it’s basically someone has shoved a couple racks in somewhere and they just kind of hope for the best.Corey: The basement. The closet. The hold of a boat, with one particular client we work with.Mike: Yeah. That was an interesting one. So, we had a—Corey and I had a client where they had all their infrastructure in the basement of a boat. And we’re [laugh] not even kidding. It’s literally in the basement of a boat.Corey: Below the waterline.Mike: Yeah below the waterline. So, there was a lot of planning around, like, what if the hold gets breached? And like, who has to plan for that sort of thing? [laugh]. It was a weird experience.Corey: It turns out that was—was hilarious about that was while they were doing their cloud migration into AWS, their account manager wasn’t the most senior account manager because, at that point, it was a small account, but they still stuck to their standard talking points about TCO, and better durability, and the rest, and it didn’t really occur to them to come back with a, what if the boat sinks? Which is the obvious reason to move out of that quote-unquote, “data center?”Mike: Yeah. It was a wild experience. So, that latter group of just everything’s an absolute wreck, like, everything—it’s just so much of a pain to work with, and you find yourself wanting to clean it up. Like, install new racks, do new cabling, put in a totally new floor so you’re not standing on concrete. You want to do all this work to it, and then you realize that you’re just putting lipstick on a pig; it’s still going to be a dirty old data center at the end of the day, no matter how much work you do to it. And you’re still running on the same crappy hardware you had, you’re still running on the same frustrating deployment process you’ve been working on, and everything still sucks, despite it looking good.Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.Corey: The worst part is playing the ‘what is different here?’ Game. You rack twelve servers: eleven come up fine and the twelfth doesn’t.Mike: [laugh].Corey: It sounds like, okay, how hard could it be? Days. It can take days. In a cloud environment, you have one weird instance. Cool, you terminate it and start a new one and life goes on whereas, in a data center, you generally can’t send back a $5,000 piece of hardware willy nilly, and you certainly can’t do it same-day, so let’s figure out what the problem is.Is that some sub-component in the system? Is it a dodgy cable? Is it, potentially, a dodgy switch port? Is there something going on with that node? Was there something weird about the way the install was done if you reimage the thing? Et cetera, et cetera. And it leads down rabbit holes super quickly.Mike: People that grew up in the era of computing that Corey and I did, you start learning tips and tricks, and they sound kind of silly these days, but things like, you never create your own cables. Even though both of us still remember how to wire a Cat 5 cable, we don’t.Corey: My fingers started throbbing when you said that because some memories never fade.Mike: Right. You don’t. Like, if you’re working in a data center, you’re buying premade cables because they’ve been tested professionally by high-end machines.Corey: And you still don’t trust it. You have a relatively inexpensive cable tester in the data center, and when—I learned this when I was racking stuff the second time, it adds a bit of time, but every cable that we took out of the packaging before we plugged it in, and we tested on the cable tester just to remove that problem. And it still doesn’t catch everything because, welcome to the world of intermittent cables that are marginal that, when you bend a certain way, stop working, and then when you look at them, start working again properly. Yes, it’s as maddening as it sounds.Mike: Yeah. And then things like rack nuts. My fingers hurt just thinking about it.Corey: Think of them as nuts that bolts wind up screwing into but they’re square and they have clips on them so they clip into the standard rack cabinets, so you can screw equipment into them. There are different sizes of them, and of course, they’re not compatible with one another. And you have—they always pinch your finger and make you bleed because they’re incredibly annoying to put in and out. Some vendors have quick rails, which are way nicer, but networking equipment is still stuck in the ‘90s in that context, and there’s always something that winds up causing problems.Mike: If you were particularly lucky, the rack nuts that you had were pliable enough that you could pinch them and pull them out with your fingers, and hopefully didn’t do too much damage. If you were particularly unlucky, you had to reach for a screwdriver to try to pry it out, and inevitably stab yourself.Corey: Or sometimes pulling it out with your fingers, it’ll—like, those edges are sharp. It’s not the most high-quality steel in some cases, and it’s just you wind up having these problems. Oh, one other thing you learn super quickly, is first, always have a set of tools there because the one you need is the one you don’t have, and the most valuable tool you’ll have is a pair of wire cutters. And what you do when you find a bad cable is you cut it before throwing it away.Mike: Yep.Corey: Because otherwise someone who is very well-meaning but you will think of them as the freaking devil, will, “Oh, there’s a perfectly good cable sitting here in the trash. I’ll put it back with the spares.” So you think you have a failed cable you grab another one from the pile of spares—remember, this is two in the morning, invariably, and you’re not thinking on all cylinders—and the problem is still there. Cut the cable when you throw it away.Mike: So, there are entire books that were written about these sorts of tips and tricks that everyone working [with 00:19:34] data center just remembers. They learned it all. And most of the stuff is completely moot now. Like, no one really thinks about it anymore. Some people are brought up in computing in such a way that they never even learned these things, which I think it’s fantastic.Corey: Oh, I don’t wish this on anyone. This used to be a prerequisite skill for anyone who called themselves a systems administrator, but I am astonished when I talk to my AWS friends, the remarkably senior engineers I talk to who have never been inside of an AWS data center.Mike: Yeah, absolutely.Corey: That’s really cool. It also means you’re completely divorced from the thing you’re doing with code and the rest, and the thing that winds up keeping the hardware going. It also leads to a bit of a dichotomy where the people racking the hardware, in many cases, don’t understand the workloads that are on there because if you have the programming insight, and ability, and can make those applications work effectively, you’re probably going to go find a role that compensates far better than working in the data center.Mike: I [laugh] want to talk about supply chains. So, when you build a data center, you start planning about—let’s say, I’m not Amazon. I’m just, like, any random company—and I want to put my stuff into a data center. If I’m going to lease someone else’s data center—which you absolutely should—we’re looking at about a 180-day lead time. And it’s like, why? Like, that’s a long time. What’s—Corey: It takes that long to sign a real estate lease?Mike: Yeah.Corey: No. It takes that long to sign a real estate lease, wind up talking to your upstream provider, getting them to go ahead and run the thing—effectively—getting the hardware ordered and shipped in the right time window, doing the actual build-out once everything is in place, and I’m sure a few other things I’m missing.Mike: Yeah, absolutely. So yeah, you have all these things that have to happen, and all of them pay for-freaking-ever. Getting Windstream on the phone to begin with, to even take your call, can often take weeks at a time. And then to get them to actually put an order for you, and then do the turnup. The turnup alone might be 90 days, where I’m just, “Hey, I’ve bought bandwidth from you, and I just need you to come out and connect the [BLEEP] cables,” might be 90 days for them to do it.And that’s ridiculous. But then you also have the hardware vendors. If you’re ordering hardware from Dell, and you’re like, “Hey, I need a couple servers.” Like, “Great. They’ll be there next week.” Instead, if you’re saying, “Hey, I need 500 servers,” they’re like, “Ooh, uh, next year, maybe.” And this is even pre-pandemic sort of thing because they don’t have all these sitting around.So, for you to get a large number of servers quickly, it’s just not a thing that’s possible. So, a lot of companies would have to buy well ahead of what they thought their needs would be, so they’d have massive amounts of unused capacity. Just racks upon racks of systems sitting there turned off, waiting for when they’re needed, just because of the ordering lead time.Corey: That’s what auto-scaling looks like in those environments because you need to have that stuff ready to go. If you have a sudden inrush of demand, you have to be able to scale up with things that are already racked, provisioned, and good to go. Sometimes you can have them halfway provisioned because you don’t know what kind of system they’re going to need to be in many cases, but that’s some up-the-stack level thinking. And again, finding failed hard drives and swapping those out, make sure you pull the right or you just destroyed an array. And all these things that I just make Amazon’s problem.It’s kind of fun to look back at this and realize that we would get annoyed then with support tickets that took three weeks to get resolved in hardware, whereas now three hours in you and I are complaining about the slow responsiveness of the cloud vendor.Mike: Yeah, the amount of quick turnaround that we can have these days on cloud infrastructure that was just unthinkable, running in data centers. We don’t run out of bandwidth now. Like, that’s just not a concern that anyone has. But when you’re running in a data center, and, “Oh, yeah. I’ve got an OC-3 line connected here. That’s only going to get me”—Corey: Which is something like—what is an OC-3? That’s something like, what, 20 gigabit, or—Mike: Yeah, something like that. It’s—Corey: Don’t quote me on that.Mike: Yeah. So, we’re going to have to look that up. So, it’s equivalent to a T-3, so I think that’s a 45 megabit?Corey: Yeah, that sounds about reasonable, yeah.Mike: So, you’ve got a T-3 line sitting here in your data center. Like that’s not terrible. And if you start maxing that out, well, you’re maxed out. You need more? Again, we’re back to the 90 to 180 day lead time to get new bandwidth.So, sucks to be you, which means you’d have to start planning your bandwidth ahead of time. And this is why we had issues like companies getting Slashdotted back in the day because when you capped the bandwidth out, well, you’re capped out. That’s it. That’s the game.Corey: Now, you’ve made the front page of Slashdot, a bunch of people visited your site, and the site fell over. That was sort of the way of the world. CDNs weren’t really a thing. Cloud wasn’t a thing. And that was just, okay, you’d bookmark the thing and try and remember to check it later.We talked about bandwidth constraints. One thing that I think the cloud providers do—at least the tier ones—that are just basically magic is full line rate between any two instances almost always. Well, remember, you have a bunch of different racks, and at the top of every rack, there’s usually a switch called—because we’re bad at naming things—top-of-rack switches. And just because everything that you have plugged in can get one gigabit to that switch—or 10 gigabit or whatever it happens to be—there is a constraint in that top-of-rack switch. So yeah, one server can talk to another one in a different rack at one gigabit, but then you have 20 different servers in each rack all trying to do something like that and you start hitting constraints.You do not see that in the public cloud environments; it is subsumed away, you don’t have to think about that level of nonsense. You just complain about what feels like the egregious data transfer charge.Mike: Right. Yeah. It was always frustrating when you had to order nice high-end switching gear from Cisco, or Arista, or take your pick of provider, and you got 48 ports in the top-of-rack, you got 48 servers all wired up to them—or 24 because we want redundancy on that—and that should be a gigabit for each connection, except when you start maxing it out, no, it’s nowhere even near that because the switch can’t handle it. And it’s absolutely magical, that the cloud provider’s like, “Oh, yeah. Of course, we handle that.”Corey: And you don’t have to think about it at all. One other use case that I did want to hit because I know we’ll get letters if we don’t, where it does make sense to build out a data center, even today, is if you have regulatory requirements around data residency. And there’s no cloud vendor in an area that suits. This generally does not apply to the United States, but there are a lot of countries that have data residency laws that do not yet have a cloud provider of their choice region, located in-country.Mike: Yeah, I’ll agree with that, but I think that’s a short-lived problem.Corey: In the fullness of time, there’ll be regions everywhere. Every build—a chicken in every pot and an AWS availability zone on every corner.Mike: [laugh]. Yeah, I think it’s going to be a fairly short-lived problem, which actually reminds me of even our clients that have data centers are often treating the data center as a cloud. So, a lot of them are using your favorite technology, Corey, Kubernetes, and they’re treating Kubernetes as a cloud, running Kube in AWS, as well, and moving workloads between the two Kube clusters. And to them, a data center is actually not really data center; it’s just a private cloud. I think that pattern works really well if you have a need to have a physical data center.Corey: And then they start doing a hybrid environment where they start expanding to a public cloud, but then they treat that cloud like just a place to run a bunch of VMs, which is expensive, and it solves a whole host of problems that we’ve already talked about. Like, we’re bad at replacing hard drives, or our data center is located on a corner where people love to get drunk on the weekends and smash into the power pole and take out half of the racks here. Things like that great, yeah, cloud can solve that, but cloud could do a lot more. You’re effectively worsening your cloud experience to improve your data center experience.Mike: Right. So, even when you have that approach, the piece of feedback that we give the client was, you have built such a thing where you have to cater to the lowest common denominator, which is the constraints that you have in the data center, which means you’re not able to use AWS the way that you should be able to use it so it’s just as expensive to run as a data center was. If they were to get rid of the data center, then the cloud would actually become cheaper for them and they would get more benefits from using it. So, that’s kind of a business decision for how they’ve structured it, and I can’t really fault them for it, but there are definitely some downsides to the approach.Corey: Mike, thank you so much for joining me here. If people want to learn more about what you’re up to, where can they find you?Mike: You know, you can find me at duckbillgroup.com, and actually, you can also find Corey at duckbillgroup.com. We help companies lower their AWS bills. So, if you have a horrifying bill, you should chat.Corey: Mike, thank you so much for taking the time to join me here.Mike: Thanks for having me.Corey: Mike Julian, CEO of The Duckbill Group and my business partner. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and then challenge me to a cable-making competition.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About GuyGuy Raz is a Sr. Systems Engineer at ExtraHop with previous experience as a Network Engineer and Solution Architect. Guy is one of the SMEs leading the unique ExtraHop approach to cloud-native NDR for the hybrid multi-cloud enterprise. Before joining the Sales Engineer team, Guy was one of the ExtraHop Solution Architects, responsible for conducting deep technical and business discovery sessions, assisting in troubleshooting and problem resolution during war-room and security/network investigations, and developing strategies for acquiring high-value data from the wire; requiring in-depth technical understanding of L2-L7 networking principles.Links: https://www.extrahop.com/ https://extrahop.com/demo TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Once a year in San Francisco, if I find myself being overly cheerful, all I have to do is walk up and down the RSA Expo Hall and look at a bunch of vendors talking about how their on-premises product kind of sort of works in the cloud, and then I’m not overly cheerful anymore. One notable exception to this is a company called Extrahop. I’ve spoken about them before and on this promoted episode, we’re going to dive a little bit deeper. Today, my guest is Senior Systems Engineer Guy Raz. Guy, thanks for taking the time to speak with me.Guy: Thanks, Corey, happy to be here.Corey: So, for those who have not caught previous episodes, or heard me ranting from the rooftop about it, at a very basic level for folks who have not even, I guess, dip their toes in the RSA space because they, you know, want to be happy with their lives, what is ExtraHop?Guy: ExtraHop is a cloud-native approach for analyzing wire data. Historically, customers have, kind of, looked at TAP SPANs, but with cloud, there’s a ton of ways of getting this natively. You know, AWS, GCP, Azure give us ways of collecting this data. ExtraHop is a platform for analyzing that network traffic and, in real-time, providing context to application and security teams.Corey: So, when you take a look at that from, I guess, the perspective of security, it’s easy to sit here and say, “Oh, so how do you wind up thinking about security in a place or time of cloud?” Because there’s an awful lot of ways to view it: you can go down the path of, “Ah, I’m going to just use all the first-party tooling from my provider, and that’s it,” which, that could be fair. Alternatively, you could go down a different path of, “I’m going to just go ahead and buy whatever they’ll sell me at RSA,” which is great because the hardest part there is the booth attendees not making actual cash register sounds with their mouths when you walk past with an open checkbook. But security always feels like a thing that’s kind of an afterthought. It’s something that is tied too closely, on some level, to this idea that you’re never going to be secure, so you may as well just give up. It’s also something people only care about after it’s been a little too late, where they really should have been caring about it. How do you see that?Guy: It’s a really unfortunate space, but you’re absolutely right, Corey, there. What we end up seeing as a lot of customers, and just the industry as a whole tends to be an afterthought when it comes to cloud. They assume cloud-native solutions or built-in free solutions have their best foot forward, have their best instance in mind. And that’s not always the case. There’s a lot of, like you mentioned, built-in solutions that these cloud providers can give us.And while a lot of them are kind of scratching the surface of what security in the cloud can provide, there’s a lot that it kind of leaves unanswered. And the unfortunate thing is, the cloud journey isn’t always the easiest. There’s a lot of lift-and-shift, there’s a lot of refactor, and sometimes the security portion of that gets put on the side street until it becomes a priority or an event happens.Corey: So, given that you can effectively not even swing a dead cat anymore without hitting 15 different security vendors all claiming to do everything you’d want, start to finish, what makes ExtraHop different? How do you approach security that’s differentiated from the rest of the, I guess, entire security industry?Guy: Yeah, that’s a really good question. I think my favorite part, and one of the reasons I love our product is the data stream that we collect. Network data is a huge source of information that’s just sitting there silently, kind of, waiting to be consumed and analyzed. In the old on-premise environment, there were legacy packet capture solutions, or ways of grabbing this information from a SPAN or a TAP. But it’s still the same data stream as we go to the cloud, it’s just a slightly different way of collecting it.So, the biggest thing that I would encourage people is, use the data that’s there. The network traffic is passing your infrastructure: it’s EC2s hitting your S3 buckets, it’s RDS instances going through a load balancer to a Lambda function. It’s all just traversing through infrastructure that you just don’t own anymore, but getting that information is a huge differentiator. You’re talking about every packet of every transaction being analyzed in real-time at a cloud-scale, which, you know, you need a smaller instance today—it’s smaller today—you need a bigger instance tomorrow, it just auto-scales up.Corey: Now, back in the world of data centers, I agreed an awful lot with what you’re saying, as far as looking at the network as the first point of, I guess, the arbiter of truth, for lack of a better term. And, on some level in cloud, I feel like I’ve drifted away from that. Now, back in our days at data centers, you don’t know what’s running on these systems; you don’t know what various engineers are shoved onto them, but generally speaking, you can mostly trust the network. Please don’t email me. So, once you move into a cloud world, everything sort of changes a bit.You don’t really have to think about any of the layer 2 networking, and most of the layer 3 networking sort of goes away, too. Plus, let’s be very realistic; from the perspective of the virtual machines you’re running in a cloud environment, everything beyond that is kind of a lie. There’s a bunch of encapsulation, you’re higher up the stack, you’re not on hardware anymore so, on some level, it always felt that, eh, networking is not really the same thing in the cloud environment. I can ignore it. And I have to admit, back when I first started talking to you folks, I was something of a skeptic.And then you, more or less, made me change my perspective through a very sneaky approach of spinning up a test account for me with ExtraHop, and now I get it in a way I never did before. Is that aha moment common to the, I guess, the cloud-native set, or do most people come into this with a much more rational and reasoned approach to networking in the cloud?Guy: I would say it’s both. We have customers who are familiar with the type of information we can provide going through their cloud journey, or are starting their cloud journey and they want the same type of visibility. But for our net-new customers, when we hit that whitespace, that aha moment comes, and it’s so much fun to see. Someone who had no idea what this type of data can provide; they’re used to legacy telemetry or log information. So, that aha moment is something that, as someone who gets to interact with customers, is one of my favorite parts of the job. And I would say it’s fun to play with and show that.Corey: Now, I want to be clear that, again, in the interest of full disclosure, now, since I’ve put this in my test account, ExtraHop is now the second most expensive consumer of AWS services. But it’s not as bad as folks might think. It’s using a VPC mirror in order to look at traffic, and that costs me the princely sum of somewhere between $10 and $11 a month. And that doesn’t really vary, regardless of how much traffic I shove through this thing. It’s not doing a whole lot in the AWS account; if I didn’t know that was there and that’s what it was doing, I would ignore the spend line entirely. How does this work? What are you doing in order to get access to seeing what is happening, “On the wire,” quote-unquote, in a cloud environment?Guy: Just focusing on AWS for a second, since that’s what you called out. It’s using a native built-in functionality that Amazon provides. It’s called VPC packet mirroring. It’s super simple: you deploy an ExtraHop collector into your VPC, you set that up as a destination of your traffic, and then you configure what’s called a monitoring session in VPC. You can say I want it to do based on these tags, I want it to send traffic based on this subnet—or any there combination of—and it just kind of works. You know, it’s beautiful.And where we’re kind of taking this to the next step is using some intelligent Lambda automation to ensure that anytime a new instance gets spun up, whether it’s tagged, untagged, deployed into a different VPC, or is a different instance size, it gets automatically added into this data feed. So, you know, you talk about the ephemerality of the cloud and how instances can spin up and spin down almost instantaneously, as soon as an instance is up, before it even gets any traffic sent to it, traffic is [laugh] coming to the ExtraHop, right? We’ll see IMDS traffic, we’ll see instance metadata, we’ll get the ENI information, all just by sitting there, passively listening.Corey: One of the things that I found particularly, I guess—appreciated about your entire approach is I didn’t have to change anything about what was actually running in this account. I didn’t have to teach the EC2 instances that something else was going on. I didn’t have to reconfigure anything on an application basis. This was purely done in the underlying VPC configuration. It was done without any downtime whatsoever.And I feel like that is an understated benefit for an awful lot of tooling. “Oh, just go ahead and roll this thing out to all of your environment.” Like, yeah, there are tens of thousands of instances and VMs scattered throughout our entire estate. Exactly how long do you think we’re going to spend on this? You don’t have that problem here, and it’s kind of nice.Guy: It is really nice. And not to take anything away from some agent solutions because they do have their [crosstalk 00:09:46]—Corey: Oh, I will, but please go on.Guy: [laugh]. But this approach to security and monitoring in the cloud, to your point, Corey, is seamless. Application owners don’t know it’s there. It doesn’t add any added load. I’m a former network engineer. Troubleshooting different instances or different virtual machines, the first thing I used to do is turn off those agents, right? Is this consuming CPU resources? Is this slowing down my agent? That’s no longer the case in cloud. That’s no longer the case with this network-based approach.Corey: I’ll also point out that it always feels like there’s a false dichotomy when we’re talking about security vendors. And it either feels like, oh, you’re in a bunch of data-center style environments, you’re migrating into the cloud, but basically today, your environment is a bunch of VMs, and maybe a load balancer or an object store. And a lot of tooling speaks super well to that use case. But then if you take a step back and look at well, the lie that all these companies love to tell themselves, and I’m no more immune to this than they are, to be very clear here, but we all tell ourselves this beautiful lie which is after this next sprint ends, then, then we’re going to go ahead and pay off all of our technical debt and things are going to be done properly with a capital P. And it never happens, but it’s the lie we tell ourselves.And we make financial decisions, in some cases, tied to that false vision of, “Well, why would I wind up embracing something that is aimed at that particular use case because once we wind up going full-on cloud-native and embracing our provider of choice, all of this stuff is going to change?” What I like about ExtraHop is, all right, assume you’re in that mythical born-in-the-cloud world where you have a significant estate that everything runs on top of these higher-level services. ExtraHop is still there, still working, and still doing exactly the sorts of things we’re talking about here. No matter where you are on that transformational journey, it feels like there’s an answer here. Is that accurate? Have I been gargling the marketing tea too heavily? What’s the story here?Guy: No, that’s pretty accurate. And it doesn’t really matter where you are on your cloud journey; security can’t be foregone for the sake of this cloud instance. We see this day in, day out. You know, if you subscribe to as many news alerts as I do, it’s a scary world. Just even recently this past weekend, we had a—not our customer, but there was an attack against an oil pipeline.That came through a cloud vulnerability. IAM account leakage, and service accounts, and open S3 buckets. It’s a scary part of this cloud journey. We want to make sure that we’re scaling, we want to reduce our physical footprint, but we can’t forgo the security and the trust that our customers have in our applications. And that means that having an approach to security in the cloud needs to be top of mind, regardless of where you are in that cloud journey.Corey: I think one of the, I guess, biggest concerns in the security space is very similar to what I deal with in the cost optimization space, which is people care about it only after they really, really, really should have cared about it, on some level. Now, over in the billing world that I live in, people generally have a failure mode of, “Well, we spent a little too much money,” and that is generally a very survivable thing. I used to say—tongue-in-cheek, only I was being completely serious—one of the reasons I went with AWS billing as my direction of choice was that no one is going to come and call me at two o’clock in the morning with a billing emergency; it is strictly a business hours problem. Security is a very different world. But if you screw up the bill, you spent too much money.If you screw up security, well, your company’s name is mud, you could try and pull a SolarWinds with a ring of ablative interns to wind up trying to pass the buck off onto, but in practice, you’re probably losing a CSO and a few other high-level execs as a sort of token offering to the market gods. And it’s painful, and I’m hard-pressed to name a company these days that has not suffered at least some form of data breach somewhere. It almost feels like it’s a losing game.Guy: It’s not a losing game, but it is a post-breach world, right? It’s not a question of, if you get breached. It’s more a question of what security holes have been left open, and what can they collect from these holes? And minimizing that attack surface is obviously critical, but understanding the damage and reacting to it as fast as possible is just as important. And honestly, that’s, kind of, my favorite parts about the cloud.You know, I can see something like a suspicious transaction, or a large increase in web traffic, and then fire off an API to Lambda that says, “Deploy the security group onto this instance.” That whole process takes milliseconds. So, the reaction time that we have with the cloud vastly surpasses what we ever had in the data center. And yeah, you’re right, maybe that adds up costing a little bit more, or creates a slightly higher bill because we called a couple Lambda functions, but no exfiltration of data; no loss of customer information. You can’t trade that off, at the end of the day.Corey: The thing that always, I guess, sort of bothered me about various breaches or various security reports is whenever companies will say definitively, “We have never suffered a security breach,” that might mean that they are absolutely on point—though, you always have this probabilities question—but it could also mean that they have no effective visibility or effective logging, and that is the dangerous part. It’s similar to this idea of back once upon a time in the early days of unbreakable Linux, when Oracle was pushing that and they said, “It is unhackable.” The entire internet proved them wrong within hours because everything can be broken into at some point. It’s just a question of how high do you raise that bar? Ideally, a little bit above random people just scanning S3 buckets.Guy: Yeah, and you know, that’s really scary, kind of, the data that we get to see when—you know, you called this earlier that aha moment. Because we’re an always-on solution, we get to see the hygiene of the network, too. I can tell you when someone hit an insecure S3 bucket, or an IAM role logged in at two in the morning that it never has before, or someone sent an API command to Lambda to spin up another instance at two in the morning, using a service account that has admin permissions. It’s a scary world in the cloud, and making sure you have that surface covered gets you to those aha moments quicker.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: One thing that I do want to draw a little bit of attention to as well, having kicked the tires on ExtraHop for a few months now, I keep forgetting that I have it in place. And the only time I really get reminded is that $10 a month for that attachment to the VPC that I see on my bill when I go over that thing with a fine-tooth comb because of who I am and what I do. My point being is that I have instances in that account that are doing a bunch of relatively strange things from time to time. And the behavior is not consistent from day to day. One of them has an IRC bouncer hanging out on it because I used to spend a disproportionate amount of my time on freenode, and it does a whole bunch of different things that looks super weird.And every time I wind up pointing a typical security product at it, it starts shrinking its head off—if it can even get that far into it—of, “This thing is clearly exploited. Shut it down, shut it down, shut it down.” And none of that happens. I mean, this thing looks very weird on the network, I’m not going to deny otherwise. This is my development box.When I’m on the road—remember back when we used to travel places?—and I would just be connecting from an iPad and remoting into this thing, and then I would have it do all of the things I would normally do on a desktop computer. But it doesn’t make noise. Now, to be clear, I also have a somewhat decent security posture on this thing so it’s not a story of it getting actively exploited and it should be making noise. But it just doesn’t say anything. It just sort of sits there quietly in the background. And it works. Whenever I log in, I have to click around to make sure it actually is still working because there’s nothing on the dashboard where it’s just giving you noise to talk about noise. Why is this such a rarity?Guy: [laugh]. So, your environment is probably pretty secure. I imagine you’re not deploying hundreds and thousands of containers and EC2s and spinning up all this type of data, but—Corey: No. It’s tiny, I spend 50 bucks a month on this account.Guy: So, it’s not atypical, the behavior you see. You know, I’ve been in POCs and proof of values where we deployed the ExtraHop, and it doesn’t see too much. And so one thing I’ve started doing for a lot of my customers is deploying a lab for them. Do you trust that something like an ExtraHop will see ransomware? Do you trust that ExtraHop will see credential harvesting, and lateral movement, and exfiltration?Or are you using your ExtraHop to troubleshoot your web applications? Let me spin up a lab for you, throw some workloads in there. We’ll drop a Kali instance or a Kubernetes cluster and show you what an attack surface can look like. Not to scare or, kind of, build on what customers are experiencing, but knock on wood, I don’t want any of my customers to be attacked, but I also have to build that confidence that if or when something happens, they’re covered.Corey: Back when I first had ExtraHop demoed for me, I was convinced it was going to be garbage, let me be very honest with you. And the reason was that the dashboard looked like it was demoware. It was well-designed, well-executed, it had a very colorful interface. It felt like bossware if I’m being perfectly honest. My belief has always been, you either get a good interface that works and is easy to use and navigate within, or you get something that looks super flashy when you do a demo on stage somewhere, but it is almost impossible to wind up effectively nailing both of those use cases. And then I started using this and I am having to eat those words because you actually did it. You wound up building something that looks great and is easy to navigate. How much work did that actually take? I mean, is that where all the engineering on this product has gone?Guy: We really appreciate it. Our UX team and our engineering group work very, very hard. We spend more on R&D and research than we do on a lot of our marketing and front-end sectors and it shows. The product kind of speaks for itself. And the experience that you’re describing with the easy-to-consume UI, with the data to support that experience behind it is our goal. And I’m happy to hear that you’re enjoying it in your lab.Corey: I just did a little poking around while I have you on the phone, and if I dig deep enough, it does tell me that there’s some weak ciphers in use. And every single one of these things is talking to an AWS-owned endpoint, which is, first, a little bit on the hilarious side, since I keep this thing current. Awesome. Secondly, the fact that I had to dig for that and it wasn’t freaking out about it. There are no alerts; it doesn’t show up on the dashboard.I had to really start diving into this. Because, yeah, it’s good to know if I’m doing some sort of audit activity, it’s good to know if I need to dive in and look at these things, but it doesn’t need to wake me up at two in the morning because, “Holy crap. The Boto3 library isn’t quite using the latest cipher suite.” How much tuning did this take?Guy: Not much. So, there is a learning period, as with any application that has a backend on behavioral analytics. But most of my customers, usually two to three weeks after we start seeing a data feed, are in a state of excellent tuning. Very little manual tuning required, the system will learn normalities, it’ll learn behaviors, and it’ll flag anomalies, kind of, on its own. So, the same experience that you’re having where you’re running a compliance scan, or you’re running an audit, or you’re trying to look for, in this world where—I’m going to make a joke here—we all have free time, and you have the time to go look at, you know, “How do I clean up some of these hygienic issues that are not currently causing me heartache?” The data is there. That’s the beauty of the network is some of your users may be familiar with Wireshark, or something like a tcpdump. There’s boatloads of data in. There are thousands and thousands of data points you can analyze though. If you want the data, it’s there, but like you said, no reason to wake you up at two in the morning unless we see things that are super critical.Corey: Encrypt everything sort of becomes the theme, especially when Amazon’s CTO slaps it on a t-shirt, and then in some cases charges extra for it; but that’s a diversion. What is the story as you start seeing more and more traffic wind up being encrypted at a bunch of different levels? In fact, I’ll take it a step further. With the rise of customer-managed keys and things like KMS in the AWS world, does that mean that ExtraHop is effectively losing visibility beyond just the typical TCP flow?Guy: So, ExtraHop is unique in the space that we have the ability to decrypt TLS 1.3 data. It came out a couple years ago and it’s a way of encrypting traffic between servers and clients in a manner that isn’t as breakable as historic encryption mechanisms were. We can parse that data, we can ingest those decryption mechanisms, we can—in real-time, without being a man-in-the-middle so we’re not breaking any of this trust chain that you have to explicitly build to the internet in a lot of cases, or you don’t have to upload any of your private keys to the ExtraHop. So, it’s a super unique approach for how we can unpack that envelope.This goes back to when we were kids, and we all got those Christmas presents and you check the box and you try to guess what’s inside. And maybe you’re right, maybe you’re not, but until you open that wrapper, you can’t really know what’s being said. So, something like a hidden database transaction underneath a web call just shows up as a web call when you’re not unpacking the envelopes. Decryption is an underrated feature, in my opinion, and I would—you know, true security posture team should probably have something where they can look inside those payloads.Corey: This is where it starts to get a little weird, too, because, on some level, great, the whole premise of TLS is that my application talks to something far away—or nearby. It doesn’t really matter—but there’s a bit of a guarantee that from the point it leaves that application and hits the encryption side on the instance to the other end, there should be no decryption there. The only way I’ve ever seen that get around that is effectively man-in-the-middling these things, which in some level, “Oh, decrypt all of your secure traffic in the name of security,” always felt a little on the silly side.Guy: Not only is it silly, it’s a little harder to manage when we talk about cloud because those man-in-the-middle decryption mechanisms typically involve building explicit trust so that they can decrypt the traffic, and then the client and the server both agree that, “Yeah, sure. You can read my information. You use your own certificate. I don’t care.” That gets harder to do as you start talking about containers, as you start talking about ephemeral instances.Sure, you can build a golden image of a container and make it trust your IPS—which most people should have—but you still have to have the ability to see this traffic when you’re bypassing certain metrics. If you’re bypassing traffic back to your data center so you can [unintelligible 00:24:45] your point of sale application, or if, maybe, you’re a multi-cloud environment where you have to pass from cloud to consume all of your data space. You still have to be able to see that data to understand what’s really being said during the conversation without always being able to break that trust chain.Corey: One thing that I want to make very clear I call out because otherwise, I am going to get letters on this. This is a promoted episode. You folks have paid to sponsor. Thank you. It is appreciated. But I want to be very clear you buy my attention, not my opinion. I know I’ve been, sort of, gushing about what ExtraHop does, and how it works, and how I view these things, but that’s not because you’re paying me to do that. I am legitimately excited about the product itself.This is one of those things where it finally is giving me visibility into something that I understand from my olden sysadmin network admin days combined with how I know the cloud works today, and I’m looking at this and the strange spots that I see of, “Ohh, I would improve that a bit,” there aren’t that many and they’re not that big. This is something that is legitimately awesome, and I would encourage people to kick the tires and see what they think.Guy: Yeah, we appreciate that feedback, Corey. A lot of us are previous users. I myself, you know, before coming to ExtraHop, used ExtraHop at a previous job and that was one of the big reasons I came to work for the company is I believe in the software. A lot of our people here and we have long-time-term employees believe in what we do. And our goal is to build this partnership and trust with our customers, too, so that they have the same experience that you do. It’s a fun product to play with, and kicking around and tires is fun and we’d love to show you.Corey: When you start talking to folks who are going through their, I guess, ExtraHop journey of discovery—don’t ever use that term. It sounds awful—what do you find that they are getting the most confused about? What do they misunderstand that would be helpful for them to have more clarity around?Guy: There’s a lot of what ExtraHop can provide when it comes to data ingestion, and data collection, and even data aggregation, but where a lot of my customers fall in the confusion space tends to be in, “Do I care about this data? Should I care about this information?” And that really falls down to the individual user’s responsibility. A security team cares about all of it, whereas an application team may only care about the website’s performance, or the network latency, or the error rates. And it spans the gambit.So, one thing that I do with a lot of my customers is weekly training sessions, or give them access to videos that we’ve recorded in advance so they can self-teach. As an engineer myself, I hate when people talk me into things: I like to play, and I like to see. So, let me give you a guide, you want to play with it, kind of poke the toes, kick the tires, have fun, that seems to get customers excited, and again, back to that aha moment a lot quicker. There’s so much data that gets exposed, and sometimes it can be overwhelming. But when it comes to visibility, it’s all stuff that’s useful at the end of the day.Corey: If people want to learn more, where can they go next? How do they begin this journey? And of course, mention me just because every time someone talks to a sponsor and brings my name up, the reflexive wince is just my favorite look in the world.Guy: Yeah, so definitely mentioned Corey’s name. [laugh]. We have online demos where people can play with the lab, you go to extrahop.com/demo. We also offer AWS trials if you want to actually deploy one and see what it looks like in your environment for a period of time. And we have teams all over the world, from the United States, EMEA, APACs, that are happy to help answer questions, help deploy, and help automate a lot of this, whether it be through something like a CloudFormation template, or Terraform scripts, whatever infrastructure as code language you choose to use.Corey: Excellent. Thank you so much for taking the time to speak with me today. I really do appreciate it.Guy: Yeah, Corey, it’s been a pleasure talking to you. And I’m looking forward to maybe having another one with you in the future.Corey: Oh, I would expect so. I’m curious to see what happens next. Guy Raz, senior systems engineer at ExtraHop. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and an insulting comment that will no doubt get flagged by ExtraHop as being something that shouldn’t be on the network.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About AustinAustin makes problems with computers, and sometimes solves them. He’s an open source maintainer, observability nerd, devops junkie, and poster. You can find him ignoring HN threads and making dumb jokes on Twitter. He wrote a book about distributed tracing, taught some college courses, streams on Twitch, and also ran a DevOps conference in Animal Crossing.Links: Lightstep: https://lightstep.com/ Lightstep Sandbox: https://lightstep.com/sandbox Desert Island DevOps: https://desertedislanddevops.com lastweekinAWS.com Resources: https://lastweekinAWS.com/resources Distributed Tracing in Practice: https://www.amazon.com/Distributed-Tracing-Practice-Instrumenting-Microservices/dp/1492056634 Twitter: https://twitter.com/austinlparker Personal Blog: https://aparker.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Austin Parker, who’s a principal developer advocate at Lightstep. Austin, welcome to the show.Austin: Hey, it’s great to be here.Corey: It really is. I love coming here. It’s one of my favorite places to go. So, let’s get the obvious stuff out of the way. You’re a principal developer advocate at Lightstep. I know this because I said it a whole sentence ago, which is about the limit of my attention span. What is Lightstep? And what does your job mean?Austin: So, Lightstep is an observability platform. We take traces, and metrics, and logs, and all that good stuff, throw them together in a big old swamp of data, and then, kind of, give you some really cool workflows to help you make sense of it, figure out, hey, where is the slow SQL query? Where is the performance bad?Corey: The way to figure out, in most of my environments, where’s the performance bad is git blame, figure out what part I wrote.Austin: But imagine there were, like, 1000, or 100,000 of you all working on this massive distributed system, and you didn’t know half—Corey: It would snark itself to death before it ever got off the ground.Austin: Yeah. I mean, I think that’s actually most large companies, right? We deliver shippable software only through inertia.Corey: Yeah. Just because at some point, it bounces off all the walls, there’s nowhere else for it to go but to production.Austin: Yep. But yeah, you have thousands of people, hundreds of people, however many people, right? I think the whole distributed workforce thing that most people are dealing with now has really made observability rise to the top of your concern list because you don’t have the luxury of just going and poking your head around the corner and saying, “Hey, Joanne. What the heck? Why did things break?” You can’t just poke someone anymore. Or you can, but you never know what you’re going to have to deal with.Corey: It feels weird to call them at home or bug their family members to poke them or whatnot. It just seems weird.Austin: It does. And until Amazon comes out with a minder drone that just, kind of like, hovers over your shoulder at all times, and pokes you, when someone is like, “Hey, you broke the build.” Then I think we’re going to need observability so that people can sort of self-serve, figure out what’s going on with their systems.Corey: Cool. One of the things I’m going to point out is that I’ve had a bunch of people attempt to explain what distributed tracing is and how observability works, and it never really stuck. And one of the things that I found that did help explain it—and we didn’t even talk about this in the pre-show, while we figure out how to pronounce each other’s names—but one of the things that has always stuck with me is the interactive sandbox on Lightstep, which used to be prominently featured on your page; now it’s buried in the menu somewhere. But it’s an interactive sandbox that sets up a scenario, problem you’re trying to solve, gives you data—so it gets away from the problem of, “Step one, have a distributed application where it’s all instrumented and reporting things in.” Because in a lot of shops, that’s not exactly a small lift that you can do in an afternoon to start testing things like this out. It’s genius. It shows what the product does, how it works, mapped to the type of problems people will generally encounter. And after I played with this, “Oh, my stars, I get it.”Austin: We actually just recently updated that to add some new stuff to it because we shipped a feature called ‘Change Intelligence’ where you can take actual time-series metrics, and then overlay those on traces and say, “Hey, I saw a weird spike,” and highlight that, and then we go through, look at all the traces for that service and its related services during that time, and tell you, “Hey, we think it might be this. Here’s things that are highly correlated in those time windows.” So, if you haven’t checked it out recently, go back and check it out. It’s—yeah, a little more hidden than it used to be, but I believe you can find it at lightstep.com/sandbox.Corey: Yeah. And there’s no sign up to do this. It’s free access. It asked for an email address, but that’s okay, I just use yours. No, I’m not kidding. I actually did. And, yeah, it works; it shows exactly what it is. It even has, instead of ‘start’ it says ‘play’ because that’s fundamentally what it is. If you’re trying to wrap your head around distributed tracing, take a look at this.Austin: Yes, definitely. I have a long-standing Jira ticket to add achievements to that.Corey: Oh, that could be fun. You could bury some, too, like misusing services as databases—Austin: Ooh.Corey: —or most expensive query to get the right answer.Austin: Yeah. And then maybe, like, there’s just one span, kind of, hidden there where it’s ‘using Route 53 as a database.’Corey: I keep seeing that cropping up more and more places. That’s something I get to own and that’s an awful lot of fun. Speaking of gamification and playing in strange ways, one of the things you did last year that I wasn’t paying attention to—because, you know, there was a pandemic on—was you were one of the organizers behind Desert Island DevOps which is a strange thing that I’ve only recently delved into—delven into—gone spelunking inside of. There we go.It wasn’t instrumented for observability—buh-dum-tss. But it’s fundamentally a DevOpsDays that takes place inside the animated world of Animal Crossing’s New Horizon, which is apparently a Nintendo game, which is apparently a game company.Austin: Yeah.Corey: It is not really my space. I don’t want to misspeak.Austin: No, you hit it. ‘Deserted.’ Deserted Island [crosstalk 00:05:43].Corey: Oh, ‘Deserted.’ Ah, got it. And don’t spell it as ‘dessert’ either, as in this would be a delicious game to play.Austin: I mean, it is a delicious and comforting sort of experience. If you aren’t familiar with Animal Crossing, the short 30-second explanation is it is a life simulator, building game where, you as your character, you are on an island, and there are relatively adorable animal NPCs that are your villagers, and you can talk to them, and they will say funny things to you. You can go around and do chores like picking up fruit or fishing. And the purpose is, kind of, do these chores, get some in-game currency, and then go spend that in-game currency on furniture so that you can make a pretty house, or buy pretty clothing. And it came out at a perfect time last year because everyone was about to bundle inside for the—well, we’re still inside—but everyone had to go inside. And suddenly, here’s this like, “Oh, it’s just this cute, sort of like, putz around and do whatever.”Corey: It was community-oriented. It was more of a building-oriented game than a destruction game.Austin: Yeah.Corey: It’s the sort of thing that is a great way of taking your mind off your troubles. It is accessible to a bunch of people that aren’t generally perceived as gamers when you think of that subculture. It really is an encompassing, warm, wonderful thing—by all accounts—and you looked at it and figured, “All right, how can we ruin something?” And the correct answer you got to is, “Let’s pour DevOps on it.”Austin: Yeah. Let’s use this as an event platform, and let’s really just tech-bro this shit up.Corey: And it seems to work super well. At the time of this recording, I have submitted a talk that I live-streamed my submission around, and I have not heard in either direction. To be perfectly frank, I forget what I wound up submitting, which is always a bit of a challenge, just because I make so many throwaway random jokes that, cool. Well, we’ll see how it plays out. I think you were even in the audience for that on the Twitch stream.Austin: Yeah. You found some bugs on the CFP form [laugh] that I had to fix.Corey: To be clear, the reason I do those things is not because it’s a look how clever I am, but rather to instead talk about how it’s not scary to submit a talk proposal. Everyone has a story that they can tell. And you don’t need a big platform or decades of experience in this space to tell a story. And that was my goal, and I think I succeeded. You would have the numbers more than I do; I hope people wound up submitting based upon seeing that. I want to hear voices that, frankly, aren’t ours all the time.Austin: I think in, like, a week, we basically got more submissions than we did for the entire CFP last year. One thing that I kind of think is interesting to bring up because you bring up, oh, we don’t hear a variety of voices, right? One thing I tell people, and I know that it’s not universally applicable advice, but I got into DevRel as a—not quite luck, but, like, everything in my life is luck, on some level. It always plays some level of importance. But I didn’t go to school to get into DevRel, I didn’t do a lot of things.I have actually been in tech, maybe—depending on how you want to count it—in terms of actually being in a software development job or primarily software development job, maybe, like, five or six years, give or take. And before that, I did a lot of stuff. I was a short-order cook; I worked at gas stations; I did tech support for Blackberry, and I did a lot of community organization. I was a union organizer for a little while. I like DevRel because it’s like, oh, this kind of integrates a lot of things I’m interested in, right?I enjoy teaching, helping people, and helping people learn, but I also like talking; I like to go and be a public figure, and I like to build a platform and use that to get a message out. And I think what I did with Deserted Island, or what the impetus there was, we suddenly were in a situation where it’s like, “Hey, there’s a bunch of people that normally get together and they fly around the globe in decent airplane seats, and people come and see us talk.” Because why? Because they think we know what we’re talking about, or because we have something that shows we know what we’re talking about, or however you want to say it. But in a lot of cases, I think people are coming for that sort of community, they’re coming because, “Hey, I can go to a room and I can sit in some weird little hotel, or conference center, or whatnot, and everyone I look at, everyone I see is someone that is doing what I’m doing, on some level. These are all people that are working in technology, they’re building things, they’re solving problems.”And that goes away really quickly when you get into this remote-first world, and when we can’t travel, and we don’t have that visual aspect. So, what I wanted to do with Deserted Island, what I thought what was important about it is, I was already sick of Zoom by the time, everyone went to Zoom; I was already sick of the idea of, oh my god, a year or two years of these sort of events and these community things just being, like, everyone’s staring at a bunch of slides and a talking head. Didn’t sound very appealing, so what if we try something different? What if we do something where it’s like, look, we’re going to take people out of their day; we’re going to put them in somewhere else. And maybe that’s somewhere else is just, hey, you’re watching people run around on an Animal Crossing Island on a Twitch stream.But that sort of moment of just, like, this isn’t what you would normally be doing, I think takes people’s heads out of their normal routine and puts them in a place where they can learn, and they can feel community, and they can feel, like, a kinship. I also think it’s really important because it’s that whole stupid New Yorker joke of, “On the internet, nobody knows you’re a dog.” We have this really cool opportunity to craft who we are as people, and how we present that to the world. And for a lot of people, you’re stuck inside; you don’t get that self-expression, so here’s a way to be expressive, right? Here’s a way to communicate who you are on a level that isn’t just a profile picture or something, or things that don’t work as well over Zoom.It’s a way to help project your identity. And that, I think, gives more weight to what you’re saying because when you feel like, “Hey, this is more of who I am,” or, “This is a representation of me. I can show something about who I am.” And that helps you speak. And that helps you deliver, I think, an effective talk. And that, again, builds community and builds these bonds.Corey: I want to talk to you about that, specifically because you are one of those people that aligns very much with my view of the world on developer marketing. But I don’t want to lead you too much on this, so why don’t you start? Take it away. Where do you stand on developer marketing? And what do people get wrong?Austin: I think the thing that a lot of people get wrong is that they try to monetize the idea of community. If you go and you search, insert major company name here; you search “Amazon community,” or you search “Microsoft community,” or you search “Google community,”—well, if you do that, you’ll get no results, but whatever, right? You get the picture that marketers in a way have turned the idea of developer community into something that you can just throw a KPI or throw an OKR on and squeeze it for money. And I don’t like that. I’m not very comfortable with that idea of community—because I think community in a lot of ways, it’s like family. And the families that you like the best are the ones you choose. I think this is—Corey: The family you choose is an important concept.Austin: Right. And for the most part… so much of human experience activity is built around finding those people you choose, and those communities develop out of that. I use AWS sometimes, I don’t necessarily know if I would put myself in a community with every other AWS user. I—Corey: Oh, I certainly wouldn’t. This is the problem. Everyone thinks when you talk about community or a group of people doing something, they’re ‘other people’ that are in some level of otherness. And that’s—like there are entire communities around AWS that I do not talk to, I do not see, I do not pretend to understand.Austin: Yeah, even at Lightstep. We’re not a massive, massive company by any means, but we have a bunch of different users that are using our tool in different ways. And they all have different needs, and they all have different wants. So, I could say, “Oh, here’s the Lightstep community.” But it’s not a useful abstraction.It’s not a useful way to abstract all of our users because any tool that’s worth using is going to be this collection of other abstractions and building blocks. Like, you… I don’t know, look at something like Notion, or look at something like Airtable, or the popularity of low or no-code stuff, where someone built a platform and then other people are building stuff on top of that platform, if you go to those user groups or you go to those forums, and it’s just like, there’s a million, million different varied use cases, and people are doing it in different ways, and some people are building this kind of application, or that kind of application, or whatever. So, the idea of, oh, there’s a community and we can monetize that community somehow, I’m uncomfortable with that from, sort of, a base level. And I’m uncomfortable with the idea of the DevRel industry—or the developer marketing industry—kind of moving towards this idea of, like, we’re going to become community marketers or whatever. I think you have to approach people as individuals.And individuals are motivated by a lot of things. They’re motivated by, can you solve this problem? Do I like you? Are you funny? Whatever. And I believe that if you’re a developer tool, and you are trying to attract developers, then [sigh] it works a lot better, I think, to have just individuals, to have people that can help influence the much broader—the superset of all developers that might have an interest in what you’re doing by being different, I guess.Being something that’s like, hey, this is entertaining, or this is informative, or this is interesting. The world is not a meritocracy. The world is governed by many, many different things. You’re not going to win over the developer industry simply by going out and having the best white papers, or having one more ad read than your competitor. You need to do something to get people interested and excited in [sigh] a way that they can see themselves using it.It’s like, why did Apple go and do ‘Think Different’ ads? Because it’s like, you using a Mac, that’s kind of like being Einstein, or that’s kind of like being Picasso. This is basic marketing stuff that I feel like a lot of technical marketers or developer marketers sort of leave at the door because they think the audience is too sophisticated for it, or their—Corey: I’ll even soft-launch it here because I haven’t at this point in time, talked about it in public, but if you go to lastweekinAWS.com/resources we wrote our own developer marketing guide because I got tired of explaining the same type of thing again, and again, and again. It asks for an email address and it sends it to you—I know, I’m as guilty as any. And I, of course, called it ‘Devreloper,’ which is absolutely a problem with me and I talk about things. But I’m right.And it goes to an awful lot of what you’re saying. An example that you just talked about of giving people something rather than trying to treat them as metrics, one of the best marketing things I’ve seen you do, for example, is you wrote O’Reilly’s Distributed Tracing in Practice which means if someone has a question about distributed tracing and how it’s supposed to work, well, that’s not a half-bad resource. And okay, I’ve read it and I have some further questions. Let me track down the author and ask them. Oh, you work at a company that is in this space? Huh. Maybe I’ll look into this. And it’s a very long-tail story. And how do you attribute that as far as, did this lead come from someone who read your book or not will drive marketers crazy.Austin: Oh, it’s super hard. And it does drive them crazy. [laugh].Corey: Yeah, my answer is, I don’t know and I don’t care. One of the early sponsors of this podcast sponsored for a month and then didn’t continue because they saw no value. A month goes by, they bought out everything that held still long enough, and, “Thank you for your business.” “Can you explain to me what changed?” “Oh, we talked to some of our big customers and it turned out the two of them had heard about us for the first time on your show.”And that inspired them to start digging into it and reaching things out, but big companies, corporate games of telephone, there was no way to attribute that. My firm belief is, on some level, that if you get in front of an audience with a message that resonates and—and this is the part some people miss—is something that solves an actual problem that they have. It works. It’s not necessarily predictable and it’s hard to say that this thing is going to go big and this thing isn’t. So, the solution, on some level is just keep publishing things that speak to your audience. But it works, long term. I’m living proof of this.Austin: Yeah. I think that it makes a lot more sense to… rather than to do, sort of, I don’t want to say vanity metrics, but kind of vanity metrics around, like, oh, this many stars, or this many forks, or whatever. There’s a lot of people, especially in this OSS proximate world. Where you have a lot of businesses that are implicitly or explicitly built on top of an open-source project, not everyone that is using your open-source project is going to, one, be capable of converting into a paid user, or two, be super interested in it. And I would rather spend time thinking about, well, what is the value someone gets out of this product?And even if that only thing is, is that, hey, we know what we’re talking about because we’ve got a bunch of really smart people that are building this product that would solve their problem. If you want to go out and build your own internal observability solution using completely open-source tools Grafanas and Prometheuses of the world, great. Go for it. I’m not going to hold you back. And for a lot of people, if they come to me and say, “Well, this is what we got, and this what we’re thinking about.”I’ll say, “Yeah. Go for it. You don’t need what we’re offering.” But I can guarantee you that as it scales and as it grows, then you’re going to have a moment where you have to ask yourself the question of, “Do I want to keep spending a bunch of time stitching together all these different data sources, and care and feeding of these databases, and this long term storage, and dealing with requests from end-users, or I just want to pay someone else to solve that problem for me? And if I’m going to pay someone else, shouldn’t I pay the people who literally spend all day every day thinking about these problems and have had decades of experience solving these problems at really big companies that have a lot of time and effort to invest in this?”Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Oh, yeah. We’re doing some new content experiments on our site, and what we’re doing is we’re having some folks write content for us. Now, when people hear that, what a lot of marketers will immediately do is dive down the path of, “Ah. I’m going to go ahead and hire some content farm.” Well, that doesn’t work, I found that we wound up working with individual people that work super well.And these are people who are able to talk about these things because their day job is managing a team of 30 SREs or something like that, where they are very clearly experts in the space. And I want to be very clear, I’m not claiming credit for our content writers; they get their own bylines on these things.Austin: Yeah.Corey: And it turns out that that, over time, leads to good outcomes because it helps people what they need. There’s the mystical SEO Juju that I don’t pretend to understand, but okay, I’m told it’s important, so fine, whatever. And it makes for an easier onboarding story, where there are now resources that I can trust and edit if I need to, as things change, that I can point people to, that isn’t a rotating selection of sketchy sites.Austin: Mm-hm. I think that’s one thing that I would love to see more of, just not in any one particular part of the tech industry, but overall, the one thing I’ve noticed, at least in the pandemic, during this whole work-from-home, whatever, whatever, we don’t talk enough. And it sounds maybe weird, but I think this actually goes back to what you’re saying earlier, about everyone having a story to tell. People don’t feel comfortable, I think, putting their opinion out there or saying, “Hey, this is what worked. This is what didn’t work.”And so if you want to go find that out—like, if I wanted to go write something about, hey, these are the five things you should do to ensure you have great observability, then that’s going to involve a lot of me going around and sort of Sherlocking my way through StackOverflow posts, and forums, and reaching out to people individually for stories and comments and whatever. And I would love to see us get to a point where we’re just like, “Actually, no. This isn’t—we should just be sharing this. Let’s write blogs about it.” If you’re sitting there thinking no one’s going to find this useful, right—like, you solve a problem, or you see something that could have worked better, and you’re like, “Eh, no one else is going to find that valuable.”I can almost guarantee you that someone is going to find that valuable. Maybe not today, maybe not tomorrow, but go ahead and write about your experiences, write about the problems you’ve solved, write about the things that have vexed you, and put that on the internet because it’s really easy to publish stuff on the internet.Corey: Yes. Which is a blessing and a curse. That is very much a double-edged sword.Austin: That very much is a double-edged sword. But I think that by biasing towards being more open, by biasing towards transparency and sharing what works, what doesn’t work, and having that just kind of be the default state, I’m a big proponent of things like radical transparency in terms of incident reports, or outages, or hiring, or anything. The more information that you can put in the world is going to—it might not make it better, but it at least helps change the conversation, gives more data points. There was a whole blow-up on Twitter this week, where someone posted like, “Hey, this is a salary I’m looking for.” I think you—Corey: Oh, yeah. She’s great.Austin: Yeah, she’s worth it, right? And the thing that got everyone’s bee in a bonnet was, like, she’s saying, “Oh, I want $185k.” And it’s like, “Well, why don’t we just publish that information?” Why isn’t everyone just very open and honest about their salary expectations? And I know why: because the paucity of information is a benefit to employers and it works against employees.There was a lady that left—gosh, where was it? [sigh] I forget the company, but she left because she found out she was systemically underpaid compared to their male peers. Having these sort of information imbalances don’t really help the people at the bottom of the pyramid. They don’t help the little guys. They really only help the people that are in the very large companies with a lot of clout and ability to control narratives.And they want it to stay that way; they don’t necessarily want you to know what everyone’s salary is because then it gives you, as someone trying to get a job, a better negotiating position because you know what someone with your level of experience is worth to them.Corey: It’s important to understand the context behind these salary negotiations and how to go about getting interviews and the rest. The entire job-hunting process is heavily biased in favor of employers because, especially at large employers, they go through this multiple times a week, whereas we go through this, as employees basically, every time we change jobs. Which for most people is every couple of years and for me, because of my mouth, it’s every three weeks.Austin: Yeah. I’m not saying it’s a simple solution. I am advocating for, sort of, societal, or just cultural shifts, but I think that it all comes full circle in the sense that, hey, a big part of observability is the idea that you need to be able to ask arbitrary questions. You want to know about unknown unknowns. And maybe that’s why I like it so much as a field, why I like tracing, why I like this idea.Because, yeah, a lot of things in the world would be interesting, and different, and maybe more equitable if we did have more observability about not just, hey, I use Kafka, I use these parameters on it, and that gives me better throughput, but what if you had observability for how HR runs? What if you had observability for how hiring is done? And that was something that you could see outside of the organization as well. What if we shared all this stuff more, and more, and more, and we treated a few less things as trade secrets? I don’t know if that’s ever going to happen in my lifetime, but it’s my default position. Let’s share more rather than less.Corey: Yes, absolutely. Especially those of us with inordinate amounts of privilege. And that privilege takes different forms; there’s the usual stuff people are talking about in terms of the fact that we are over-represented in tech in many respects, but there are other forms of privilege, too. There’s a privilege that comes with seniority in the space, there is a privilege with being a published author, in your case, there is privilege in having a broad audience, like I do. And it just becomes this incredibly nuanced story.The easiest part of it to lose sight of—at least for me—is I tell stories about what has worked for me and how I’ve done what I do, and I have to be constantly conscious of the fact that there is that privilege baked in and call it out where I can. I’ve gotten much better at that, but it’s an ongoing process. Because what works for me does not work for other people across a wide variety of different axes. And I don’t want people to feel bad based upon what I say.Austin: Oh, yeah, absolutely. I mean, I’m in the same boat. Like, I tend to be very irreverent and/or shitpost-y and I don’t have much of an explanation other than, I learned at some point in my life, that it’s just… [sigh] I would rather go through life shitposting on Twitter, rather than be employable. It’s just who I am. There’s—I’m sure some people think I come off as rude. I don’t know. I also agree, you’d never punch down. You only punch up. But you never know how other people are going to take that, and I don’t think that it always gets interpreted in the spirit it was meant. And I can always do better, right?Corey: As can we all. The hard part for, I think, a lot of us is to suppress that initial flash of defensiveness when someone says you didn’t quite get there, and learn from the experience. One of the ways I do that, personally, is I walk away before responding, sometimes. I want to be a better version of myself, but when I get called out of—like, this tweet thread is the whitest thing I’ve seen since I redid my bathroom walls, and I get a flash of defensiveness, “Excuse me. That’s not accurate.”And… and then I stop and I think, and then sanity prevails, where it’s, yeah. There’s a lot of privilege baked into my existence, and if I don’t see it, that doesn’t mean it’s not there. I have made it a firm rule of not responding defensively to things like that, ever. And there are times when I get called out for aspects of how I present that I don’t believe are justified, to be very honest. But that is a me thing; that is not them, and I welcome the feedback, regardless. If you make people feel like a jerk for giving you feedback, they stop giving you feedback. And then where are you?Austin: Yeah. Funny anecdote. I wrote a blog for my personal blog a little while ago about, oh, togetherness, community, something like that. But I wrote—the intro was something like, talking about why people love Sweet Caroline, right? Favorite song in the world.Corey: [sings].Austin: [joins in]. Yeah.Corey: Yeah. I’m not allowed to play with that song here at The Duckbill Group because one of our employees is named Caroline and, firm rule: don’t make fun of people’s names. They’re sensitive about it, and let’s not kid ourselves here, I own the company. Even if she says, “It’s fine, I love it.” That doesn’t help because I own the company. There is a power imbalance here.Austin: Yeah.Corey: I don’t know that she would feel that she had the psychological safety to say, “That’s not funny.” I absolutely hope she would because that’s the culture that I spend significant effort on building, but I can’t depend on that. So, I don’t go down the path of making those jokes. But I—yes, I love the intro to the song. Please continue.Austin: It’s great. Everyone loves it. So, the intro of my initial paragraph was ruminating on that. And this post went around enough that it got submitted to Hacker News a few times, and the only comment it got was some mendacious busybody Hacker News type going on about why I would be so racist against white people. [laugh]. And I was just like, “And this is why I don’t come to this website at all.”Corey: Yeah. There are so many things on Twitter that are challenging and difficult and obnoxious, and it’s still the best thing we have for a sense of community. This has replaced IRC for me, to be perfectly honest.Austin: Yeah. No, I used to be big on IRC, and then I left because [sigh], well, a couple reasons. One, I really liked being able to post gifs.Corey: Yeah, that is something where the IRC experience is substandard. I was Freenode network staff for years—Austin: Oh wow.Corey: —and that was the thing to do. Now, turns out that the open-source dialogue and the community dialogue have shifted form. And I still hang out there periodically for specific things, but by and large, it’s not where the discourse is.Austin: Yeah, it is interesting. It’s something that concerns me, kind of, in a long term sense about not only our identity but also, sort of, the actual organic communities we formed, we’ve put on to these extremely unaccountable privately held platforms whose goal is monetization and growth so that they can continue to make money. And for as much as anyone can rightfully say, “Hey, Twitter’s missed the mark,” a lot of times, it is a hard balance to strike. They don’t have simple questions to answer, and I don’t necessarily know if the nuance of their solutions has really risen to the challenge of answering those well, but it’s a hard thing for them to do. That said, I think we’re in a really awkward position where suddenly you’ve got the world’s collection of open-source software is being hosted on a platform that is run by Microsoft, and I am old enough to remember. “Embrace, extend, extinguish.”Corey: Oh, yeah. I made an entire personality out of hating Microsoft.Austin: Yeah. And I mean, a lot of people still do. I read MacRumors sometimes, and they’re all posting there still. Or Slashdot.Corey: I wondered where they’d gone. I didn’t think everyone had changed their mind.Austin: I had just a very out-of-body moment yesterday because someone replied to a comment on mine about Slashdot on it, and then the Slashdot Twitter account liked it. And there exists a photo of me from when I was a teenager, where I owned a Slashdot ballcap. And that picture is somewhere in the world. Probably not on the internet, though, for very good reason.Corey: I’m mostly just still reeling at the discovery that there’s a Slashdot Twitter account. But I guess time does evolve.Austin: It does. It makes fools of us all.Corey: It really does. Well, Austin, thank you so much for taking the time to speak with me. If people want to learn more about what you’re up to, how you view the world, et cetera, et cetera, et cetera. Where can they find you?Austin: So, you can find me on Twitter, mostly, at @austinlparker. You can find my blog with various musings that is updated frequently at aparker.io and you can learn more about Deserted Island DevOps 2021, coming on April 30th this year, at desertedislanddevops.com.Corey: Excellent. And we will put links to all of that in the [show notes 00:34:01]. Thank you so much for taking the time to speak with me. I appreciate it.Austin: Thank you for having me. This was a lot of fun.Corey: It really was. Austin Parker, principal developer advocate at Lightstep. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and then a giant series of comments that all reference one another and then completely lose track of how they all interrelate and be unable to diagnose performance issues.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.This has been a HumblePod production. Stay humble.
About DonnieDonnie is VP of Products at Docker and leads product vision and strategy. He manages a holistic products team including product management, product design, documentation & analytics. Before joining Docker, Donnie was an executive in residence at Scale Venture Partners and VP of IT Service Delivery at CWT leading the DevOps transformation. Prior to those roles, he led a global team at 451 Research (acquired by S&P Global Market Intelligence), advised startups and Global 2000 enterprises at RedMonk and led more than 250 open-source contributors at Gentoo Linux. Donnie holds a Ph.D. in biochemistry and biophysics from Oregon State University, where he specialized in computational structural biology, and dual B.S. and B.A. degrees in biochemistry and chemistry from the University of Richmond.Links: Docker: https://www.docker.com/ Twitter: https://twitter.com/dberkholz TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io. Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Today I’m joined by Donnie Berkholz, who’s here to talk about his role as the VP of Products at Docker, whether he knows it or not. Donnie, welcome to the show.Donnie: Thanks. I’m excited to be here.Corey: So, the burning question that I have that inspired me to reach out to you is fundamentally, and very bluntly and directly, Docker was a thing in, I want to say the 2015-ish era, where there was someone who gave a parody talk for five minutes where they got up and said nothing but the word Docker over and over again, in a bunch of different tones, and everyone laughed because it seemed like, for a while, that was what a lot of tech conference talks were about 50% of the way. It’s years later, now, and it’s 2021 as of the time of this recording. How is Docker relevant today?Donnie: Great question. And I think one that a lot of people are wondering about. The way that I think about it, and the reason that I joined Docker, about six months back now, was, I saw the same thing you did in the early 2010s, 2013 to 2016 or so. Docker was a brand new tool, beloved of developers and DevOps engineers everywhere. And they took that, gained the traction of millions of people, and tried to pivot really hard into taking that bottom-up open-source traction and turning it into a top-down, kind of, sell to the CIO and the VP operations, orchestration management, kind of classic big company approach. And that approach never really took off to the extent that would let Docker become an explosive success commercially in the same way that it did across the open-source community and building out the usability of containers as a concept.Now, new Docker, as of November 2019, divested all of the top-down operations production environment stuff to Mirantis and took a look at what else there was. And the executive staff at the time, the investors thought there might be something in there, it’s worth making a bet on the developer-facing parts of Docker to see if the things that built the developer love in the first place were commercially viable as well. And so looking through that we had things left like Docker Hub, Docker Engine, things like Notary, and Docker Desktop. So, a lot of the direct tools that developers use on a daily basis to get their jobs done when they’re working on modern applications, whether that’s twelve-factor, whether that’s something they’re trying to lift and shift into a container, whatever it might look like, it’s still used every day. And so the thought was, there might be something in here.Let’s invest some money, let’s invest some time and see what we can make of it because it feels promising. And fast-forward a couple of years—we’re in early 2021—we just announced our Series B investment because the past year has shown that there’s something real there. People are using Docker heavily; people are willing to pay for it, and where we’re going with it is much higher level than just containers or just a registry. I think there’s a lot more opportunity there. When I was watching the market as a whole drifting toward Kubernetes, what you can see is, to me, it’s a lot like a repeat of the old OpenStack days where you’ve got tons of vendors in the space, it’s extremely crowded, everybody’s trying to sell the same thing to the same small set of early adopters who are ready for it.Whereas if you look at the developer side of containers, it’s very sparsely populated. Nobody’s gone hard after developers in a bottom-up self-service kind of way and helped them adopt containers and helped them be more productive doing so. So, I saw that as a really compelling opportunity and one where I feel like we’ve got a lot of runway ahead of us.Corey: Back in the early days—this is a bit of a history lesson that I’m sure you’re aware of, but I want to make sure that my understanding winds up aligning with yours is, Docker was transformative when it was announced—I want to say 2012, in Santa Clara, but don’t quote me on that one—and, effectively, what it promised to solve was—I mean, containerization was not a new idea. We had that with LPARs on mainframes way before my time. And it’s sort of iterated forward ever since. What it fundamentally solved was the tooling around those things where suddenly it got rid of the problem of, “Well, it worked on my machine.” And the rejoinder from the grumpy ops person—which I very much was—was, “Great. Then backup your email because your laptop’s about to go into production.”By having containers, suddenly you have an environment or an application that was packaged inside of a mini-environment that was able to be run basically anywhere. And it was, write once, deploy basically as many times as you want. And over time, that became incredibly interesting, not just for developers, but also for folks who were trying to migrate applications. You can stuff basically anything into a container. Whether you should or not is a completely separate conversation that I am going to avoid by a wide margin. Am I right so far in everything that I have said there?Donnie: Yep. Absolutely.Corey: Awesome. So, then we have this container runtime that handles the packaging piece. And then people replaced Docker in that cherished position in their hearts—which is the thing that they talk about, even when you beg them to stop—with Kubernetes, which is effectively an orchestration system for containers, invariably Docker. And now people are talking about that constantly and consistently. If we go back to looking at similar things in the ecosystem, people used to care tremendously about what distribution of Linux they ran.And then—well, okay. If not the distro, definitely the OS wars of, is this Windows or is this a Linux workload? And as time has gone on, people care about that less and less where they just want the application to work; they don’t care what it’s running in under the hood. And it feels that the container runtime has gotten to that point as well. And soon, my belief is that we’re going to see the orchestrator slip below that surface level of awareness of things people have to care about, if for no other reason than if you look at Kubernetes today, it is fiendishly complicated, and that doesn’t usually last very long in this space before there’s an abstraction layer built that compresses all of that into something you don’t really have to think about, except for a small number of people at very specific companies. Does that in any way change, I guess, the relevance of Docker to developers today? Or am I thinking about this the wrong way with viewing Docker as a pure technology, instead of an ecosystem?Donnie: I think it changes the relevance of Docker much more to platform teams and DevOps teams—as much as I wish that wasn’t a word or a term—operations groups that are running the Kubernetes environments, or that are running applications at scale in production, where maybe in the early days, they would run Docker directly in prod, then they moved to running Docker as a container runtime within Kubernetes, and more recently, the core of Docker—which was containerd—as a replacement for that overall Docker, which used dockershim. So, I think the change here is really around, what does that production environment look like? And where we’re really focusing our effort is much more on the developer experience. I think that’s where Docker found its magic in the first place was in taking incredibly complicated technologies and making them really easy in a way that developers love to use. So, we continue to invest much more on the developer tools part of it, rather than what does the shape of the production environment look like?And how do we horizontally scale this to hundreds or thousands of containers? Not interesting problems for us right now. We’re much more looking at things like how do we keep it simple for developers so they can focus on a simple application. But it is an application and not just a container, so we’re still thinking of moving to things that developers care about. They don’t necessarily care about containers; they care about their app.So, what’s the shape of that app, and how does it fit into the structure of containers? In some cases, it’s a single container, in some cases, it’s multiple containers. And that’s where we’ve seen Docker Compose pick up as a hugely popular technology. When we look at our own surveys, when we look at external surveys, we see on the order of two-thirds of people who use Docker using Compose to do it, either for ease of automation and reproducibility or for ease of managing an application that spans across multiple containers as a logical service, rather than try and shove it all in one and hope it sticks.Corey: I used to be relatively, I guess, cynical about Docker. In fact, one of my first breakout talks started life as a lightning talk called “Heresy in the Church of Docker,” where I just came up with a list of a few things that were challenging and didn’t fully understand. It was mostly jokes, and the first half of it was set to the backstory of an embarrassing chocolate coffee explosion that a boss of mine once had. And that was great. Like, what’s the story here? What’s the relevance? Just a story of someone who didn’t understand their failure modes of containers in production. Cue laugh.And that was great. And someone came up to me and said, “Hey, can you give the full version of that talk at ContainerCon?” To which my response was, “There’s a full version?” Followed immediately by, “Absolutely.” And it sort of took life from there.Now, I want to say that talk hasn’t aged super well because everything that I highlighted in that talk has since been fixed. I was just early and being snarky, and I genuinely, when I gave that first version, didn’t understand the answers. And I was expecting to be corrected vociferously by an awful lot of folks. Instead, it was, “Yeah, these are challenges.” At which point I realized, “Holy crap, maybe everyone isn’t 80 years ahead of me in technical understanding.” And for better or worse, it’s set an interesting tone.Donnie: Absolutely. So, what do you think people really took out of that talk that surprised you?Corey: The first thing that I think, from my perspective, that caught me by surprise was that people are looking at me as some sort of thought leader—their term, not mine—and my response was, “Holy crap. I’m not a thought leader. I’m just a loud, white guy in tech.” And yep, those are pretty much the same thing in some circles, which is its own series of problems. But further, people were looking at this and taking it seriously, as in, “Well, we do need to have some plans to mitigate this.”And there are different discussions that went back and forth with folks coming up with various solutions to these things. And my first awareness, at least, that pointing out problems where you don’t know the answer is not always a terrible thing; it can be a useful thing as well. And it also—let me put a bit of a flag there as far as a point in time because looking back at that talk, it’s naive. I’ve done a bunch of things since then with Docker. I mean, today, I run Docker on my overpowered Mac to have a container that’s listening with our syslog.And I have a bunch of devices around the house that are spitting out their logs there, so when things explode I have a rough idea of what happened. It solves weird problems. I wind up doing a number of deployment processes here for serverless nonsense via Docker. It has become this pervasive technology that if I were to take an absolutist stance that, “Oh, Docker is terrible. I’m never going to use Docker.”It’s still here for me, and it’s still available and working. But I want to get back to something you said a minute ago because my use of Docker is very much the operations sysadmin-with-title-inflation whatever we’re calling them this week; that use case and that model. Who is Docker viewing as its customer today? Who as a company are you identifying as the people with the painful problem that you can solve?Donnie: For us, it’s really about the developer, rather than the ops team. And specifically it’s about the development team. And this to me is a really important distinction because developers don’t work in isolation; developers collaborate together on a daily basis, and a lot of that collaboration is very poorly solved. You jump very quickly from, “I’m doing remote pairing in my code editor,” to, “It’s pushed to GitHub, and it’s now instantly rolling into my CI pipeline on its way to production.” There’s not a lot of intermediate ground there.So, when we think about how developers are trying to build, share, and run modern applications, I think there’s a ton of whitespace in there. We’ve been sharing a bunch of experiments, for anybody who’s interested. We do community all-hands every couple of months where we share, here’s some of the things we’re working on. And importantly, to me, it’s focused on problems. Everything you were describing in that heresy talk was about problems that exist, and pointing out problems.And those problems, for us, when we talk to developers using Docker, those problems form the core of our roadmap. The problems we hear the most often as the most frustrating and the most painful, guess what? Those are the things we’re going to focus on as great opportunities for us. And so we hear people talking about things like they’re using Docker, or they’re using containers, but they have a really hard time finding the good ones. And they can’t create good ones, they are just looking for more guidance, more prescription, more curation, to be able to figure out where’s this good stuff amidst the millions of containers out there? How do I find the ones that are worth using, for me as an individual, for me as a team, and for me as a company. I mean, all of those have different levels of requirements and expectations associated with them.Corey: One of the perceptions I’ve had of the DevOps movement—as someone who started off as a grumpy Linux systems administrator—is the sense that they’re trying to converge application developers with infrastructure engineers at some point. And I started off taking a very, “Oh, I’m not a developer. I don’t write code.” And then it was, “Huh. You know, I am writing an awful lot of configuration, often in something like Ruby or Python.” And of course, now it seems like everyone has converged as developers with the lingua franca of all development everywhere, which is, of course, YAML. Do you think there’s a divide between the ops folks and the application developers in 2021?Donnie: You know, I think it’s a long journey. Back when I was at RedMonk, I wrote up a post talking about the way those roles were changing, the responsibilities were shifting over time. And you step back in time, and it was very much, you know, the developer owns the dev stack, the local stack, or if there’s a remote developer environment, they’re 100% responsible for it. And the ops team owned production, 100% responsible for everything in that stack. And over the past decade, that’s clearly been evolving.They could still own their code in production and get the value out of understanding how that was used, the value of fast iteration cycles, without having to own it all, everywhere, all of the time, and have to focus their time on things that they had really no time or interest to spend it on. So, those things have both been happening to me, not in parallel, quite; I think DevOps in terms of ops learning development skillsets and applying those has been faster than development teams who were taking ownership for that full lifecycle and that iteration all the way to production, and then back around. Part of that is cultural in terms of what developer teams have been willing to do. Part of it is cultural in terms of what the old operations teams—now becoming platform engineering teams—have been willing to give up, and their willingness to sacrifice control. There’s always good times like PCI compliance, and how do you fight those sorts of battles.And when I think about it, it’s been rotating. And first, we saw infrastructure teams, ops teams, take more ownership for being a platform, in a lot of cases, either guided by the emerging infrastructure automation config management tools like CFEngine back in the early 90s, which turned into Puppet and Chef, which turned into Ansible and Salt, which now continue to evolve beyond those. A lot of those enabled that rotation of responsibilities where infrastructure could be a platform rather than an ops team that had to take ownership of overall production. And that was really, to me, it was ops moving into a development mindset, and development capabilities, and development skillsets. Now, at the same time, development teams were starting to have the ability to take over ownership for their code running into production without having to take ownership over the full production stack and all the complexities involved in the hardware, and the data centers, and the colos, or the public cloud production environments, whatever they may be.So, there’s a lot of barriers in the way, but to me, those have been all happening alongside, time-shifted a little bit. And then really, the core of it was as those two groups become increasingly similar in how they think and how they work, breaking down more of the silos in terms of how they collaborate effectively, and how they can help solve each other’s problems, instead of really being separate worlds.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: Docker was always described as a DevOps tool. And well, “What is DevOps?” “Oh, it’s about breaking down the silos between developers and the operations folks.” Cool, great. Well, let’s try this. And I used to run DevOps teams. I know, I know, don’t email me. When you’re picking your battles, team naming is one of the last ones I try to get to.But then we would, okay, I’m going to get this application that is in a container from development. Cool. It’s—don’t look inside of it, it’s just going to make you sad, but take these containers and put them into production and you can manage them regardless of what that application is actually doing. It felt like it wasn’t so much breaking down a wall, as it was giving a mechanism to hurl things over that wall. Is that just because I worked in terrible places with bad culture? If so, I don’t know that I’m very alone in that, but that’s what it felt like.Donnie: It’s a good question. And I think there’s multiple pieces to that. It is important. I just was rereading the Team Topologies book the other day, which talks about the idea of a team API, and how do you interface with other teams as people as well as the products or platforms they’re supporting? And I think there’s a lot of value in having the ability to throw things over a wall—or down a pipeline; however you think about it—in a very automated way, rather than going off and filing a ticket with your friendly ITSM instance, and waiting for somebody else to take action based on that.So, there’s a ton of value there. The other side of it, I think, is more of the consultative role, rather than the take work from another team and then go do another thing with it, and then pass it to the next team down and then so on, unto eternity. Which is really, how do you take the expertise across all those teams and bring it together to solve the problems when they affect a broader radius of groups. And so, that might be when you’re thinking about designing the next iteration of your application, you might want to have somebody with more infrastructure expertise in the room, depending on the problems you’re solving. You might want to have somebody who has a really deep understanding of your security requirements or compliance requirements if you’re redesigning an application that’s dealing with credit card data.But all those are problems that you can’t solve in isolation; you have to solve them by breaking down the barriers. Because the alternative is you build it, and then you try and release it, and then you have a gatekeeper that holds up a big red flag, delays your release by six months so you can go back and fix all the crap you forgot to do in the first place.Corey: While on the topic of being able to, I guess, use containers as sort of as these agnostic components, I suppose, and the effects that that has, I’d love to get your take on this idea that I see that’s relatively pervasive, which is, “I can build an application inside of containers”—and that is, let’s be clear, that is the way an awful lot of containers are being built today. If people are telling you otherwise, they’re wrong—“And then just run it in any environment. You’ve built an application that is completely cloud agnostic.” And what cloud you’re going to run it in today—or even your own data center—is purely a question of either, “What’s the cheapest one I can use today?” Or, “What is my mood this morning?” And you press a button and the application lives in that environment flawlessly, regardless of what that provider is. Where do you stand on that, I guess, utopian vision?Donnie: Yeah, I think it’s almost a dystopian vision, the way I think about it—which is the least common denominator approach to portability—limits your ability to focus on innovation rather than focusing on managing that portability layer. There are cases where it’s worth doing because you’re at significant risk, for some reason, of focusing on a specific portability platform versus another one, but the bulk of the time, to me, it’s about how do you focus your time and effort where you can create value for your company? Your company doesn’t care about containers; your company doesn’t care about Kubernetes; your company cares about getting value to their customers more quickly. So, whatever it takes to do that, that’s where you should be focusing as much time and energy as possible. So, the container interface is one API of an application, one thing that enables you to take it to different places, but there’s lots of other ones as well.I mean, no container runs in isolation. I think there’s some quote, I forget the author, but, “No human is an island” at this point. No container runs in isolation by itself. No group of containers do, either. They have dependencies, they have interactions, there’s always going to be a lot more to it, of how do you interact with other services?How do you do so in a way that lets you get the most bang for your buck and focus on differentiation? And none of that is going to be from only using the barest possible infrastructure components and limiting yourself to something that feels like shared functionality across multiple cloud providers or multiple other platforms.Corey: This gets into the sort of the battle of multi-cloud. My position has been that, first, there are a lot of vendors that try and push back against the idea of going all-in on one provider for a variety of reasons that aren’t necessarily ideal. But the transparent thing that I tend to see—or at least I believe that I see—is that well, if fundamentally, you wind up going all-in on a provider, an awful lot of third-party vendors will have nothing left to sell you. Whereas as long as you’re trying to split the difference and ride multiple horses at once, well, there’s a whole lot of painful problems in there that you can sell solutions to. That might be overly cynical, but it’s hard to see some stories like that.Now, that’s often been misinterpreted as that I believe that you should always have every workload on a single provider of choice and that’s it. I don’t think that makes sense, either. I mean, I have my email system run in GSuite, which is part of Google Cloud, for whatever reason, and I don’t use Amazon’s offering for the same because I’m not nuts. Whereas my infrastructure does indeed live in AWS, but I also pay for GitHub as an example—which is also in the Azure business unit because of course it is—and different workloads live in different places. That’s a naive oversimplification, but in large companies, different workloads do live in different places.Then you get into stories such as acquisitions of different divisions that are running in completely different providers. I don’t see any real reason to migrate those things, but I also don’t see a reason why you have to have single points of control that reach into all of those different application workloads at the same time. Maybe I’m oversimplifying, and I’m not seeing a whole subset of the world. Curious to hear where you stand on that one?Donnie: Yeah, it’s an interesting one. I definitely see a lot of the same things that you do, which is lots of different applications, each running in their own place. A former colleague of mine used to call it ‘best execution venue’ over at 451. And what I don’t see, or almost never see, is that unicorn of the single application that seamlessly migrates across multiple different cloud providers, or does the whole cloud-bursting thing where you’ve got your on-prem or colo workload, and it seamlessly pops over into AWS, or Azure, or GCP, or wherever else, during peak capacity season, like tax season if you’re at a tax company, or something along those lines. You almost never see anything that realistically does that because it’s so hard to do and the payoff is so low compared to putting it in one place where it’s the best suited for it and focusing your time and effort on the business value part of it rather than on the cost minimization part and the risk mitigation part of, if you have to move from one cloud provider to another, what is it going to take to do that? Well, it’s not going to be that easy. You’ll get it done, but it’ll be a year and a half later, by the time you get there and your customers might not be too happy at that point.Corey: One area I want to get at is, you talk about, now, addressing developers where they are and solving problems that they have. What are those problems? What painful problem does a developer have today as they’re building an application that Docker is aimed at solving?Donnie: When we put the problems that we’re hearing from our customers into three big buckets, we think about that as building, sharing, and running a modern application. There’s lots of applications out there; not all of them are modern, so we’re already trying to focus ourselves into a segment of those groups where Docker is really well-suited and containers are really well suited to solve those problems, rather than something where you’re kind of forklift-ing it in and trying to make it work to the best of your ability. So, when we think about that, what we hear a lot of is three common themes. Around building applications, we hear a lot about developer velocity, about time being wasted, both sitting at gatekeepers, but also searching for good reusable components. So, we hear a lot of that around building applications, which is, give me a developer velocity, give me good high-trust content, help me create the good stuff so that when I’m publishing the app, I can easily share it, and I can easily feel confident that it’s good.And on the sharing note, people consistently say that it’s very hard for them to stay in sync with their teams if there’s multiple people working on the same application or the same part of the codebase. It’s really challenging to do that in anything resembling a real-time basis. You’ve got the repository, which people tend to think of—whether that’s a container repository, or whether that’s a code repository—they tend to think of that as, “I’m publishing this.” But where do you share? What do you collaborate on things that aren’t ready to publish yet?And we hear a lot of people who are looking for that sort of middle ground of how do I keep in sync with my colleagues on things that aren’t ready to put that stamp on where I feel like it’s done enough to share with the world? And then the third theme that we hear a lot about is around running applications. And when I distinguish this against old Docker, the big difference here is we don’t want to be the runtime platform in production. What we want to do is provide developers with a high-fidelity, consistent kind of experience, no matter which environment they’re working with. So, if they’re in their desktop, if they’re in their CI pipeline, or if they’re working with a cloud-hosted developer environment, or even production, we want to provide them with that same kind of feeling experience.And so an example of this was last year, we built these Compose plugins that we call code-to-cloud plugins, where you could deploy to ECS, or you could deploy to ACI cloud container instances, in addition to being able to do a local Compose up. And all of that gives you the same kind of experience because you can flip between one Docker context and the other and run, essentially, the same set of commands. So, we hear people trying to deal with productivity, trying to deal with collaboration, trying to deal with complex experiences, and trying to simplify all of those. So, those are really the big areas we’re looking at is that build, share, run themes.Corey: What does that mean for the future of Docker? What is the vision that you folks are aiming at that goes beyond just, I guess—I’m not trying to be insulting when I say this, but the pedestrian concerns of today? Because viewed through the lens of the future, looking back at these days, every technical problem we have is going to seem, on some level, like it’s, “Oh, it’s easy. There’s a better solution.” What does Docker become in 15 years?Donnie: Yeah, I think there’s a big gap between where people edit their code, where people save their source code, and that path to production. And so, we see ourselves as providing a really valuable development tools that—we’re not going to be the IDE and we’re not going to be the pipeline, but we’re going to be a lot of that glue that ties everything together. One thing that has only gotten worse over the years is the amount of fragmentation that’s out there in developer toolchains, developer pipelines, similar with the rise of microservices over the past decade, it’s only gotten more complicated, more languages, more tools, more things to support and an exponentially increasing number of interconnections where things need to integrate well together. And so that’s the problem that, really, we’re solving is all those things are super-complicated, huge pain to make everything work consistently, and we think there’s a huge amount of value there and tying that together for the individual, for the team.Corey: Donnie, thank you so much for taking the time to speak with me today. If people want to learn more about what you’re up to, where can they find you?Donnie: I am extremely easy to find on the internet. If you Google my name, you will track down, probably, ten different ways of getting in touch. Twitter is the one where I tend to be the most responsive, so please feel free to reach out there. My username is @dberkholz.Corey: And we will, of course, put a link to that in the [show notes 00:29:58]. Thanks so much for your time. I really appreciate the opportunity to explore your perspective on these things.Donnie: Thanks for having me on the show. And thanks everybody for listening.Corey: Donnie Berkholz, VP of products at Docker. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment that explains exactly why you should be packaging up that comment and running it in any cloud provider just as soon as you get Docker’s command-line arguments squared away in your own head.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.This has been a HumblePod production. Stay humble.
About OwenOwen Rogers wears many hats at 451 Research; he’s research director of cloud transformation and digital economics and head of the quantum computing centre of excellence. Prior to these positions, Owen was a doctoral researcher in cloud computing at the University of Bristol, completing his PhD thesis in 2013; a product portfolio manager at Claranet; and an independent product management and cloud computing consultant, among other positions.Join Corey and Owen as they talk about what it’s like when two cloud economists meet at an event but only one has a PhD, what exactly an industry analyst does, how 451 Research found that 53% of companies increased cloud spend during the pandemic and what resources they’re investing in, the Law of Cloud Entropy and why Owen believes the cloud will only get more disordered in the future, why it’s easy for cloud costs to spiral out of control, how organizations are trying to rein in cloud spend despite using more cloud services, why Owen doesn’t believe we’ll reach cloud commoditization anytime soon, and more.Links: 451 Research: https://451research.com/ Cloud Price Index: https://451research.com/services/price-indexing-benchmarking/cloud-price-index Twitter: https://twitter.com/owenrog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It’s an awesome approach. I’ve used something similar for years. Check them out. But wait, there’s more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It’s awesome. If you don’t do something like this, you’re likely to find out that you’ve gotten breached, the hard way. Take a look at this. It’s one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That’s canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I’m a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by Owen Rogers, who’s a research director of cloud and managed services transformation, at 451 Research, a part of S&P Global Market Intelligence. Owen, thank you for tolerating my slings and arrows.Owen: Lovely to be here, and I look forward to them.Corey: So, you got your PhD back in 2013, in cloud economics. And I know this because when we first met at a cloud economics event, I called myself a cloud economist and you lit up like a Christmas tree. “Oh, my God. Someone else does what I do.” And I just gave myself the title because I thought it was something I’d invented, whereas you actually got a PhD in it, and you made the understandable assumption that I knew what I was talking about.And I knew I had two directions I could go in. The first was, to be honest and come clean, and the other was to basically string you along until we co-publish a book. It comes out in two weeks, and its title is—I’m kidding. I’m kidding. But thank you for being as gracious about me stomping on your credentials back then, as you were.Owen: No, I was relieved, actually, that someone else was looking into this because I thought I was just on my own and I’d made this big gamble by coming up with this PhD and taking a bit of a risk. And when I saw you at that event, I was like, “Thank God it’s not just me. Thank God, this actually might be a thing to pay attention to.” So, thanks for being there.Corey: No, by all means, it turns out that there’s a very narrow subset of people who care about these things, and that tends to be a somewhat insular circle. So, it was nice to finally meet someone who was a bit outside of the nuts and bolts of lowering bills and looking at the broader implications across the market. But we’ll get there. You’re an industry analyst. What does that mean?Owen: So essentially, we are the intermediary party between buyers and sellers. So, we help sellers of services and goods work out who to sell to, how to sell, and we help buyers work out what product is best for them. And we do this by conducting market research across buyers and sellers, pretty much. So, my specialism is cloud economics, so it’s my job to help solve a lot of this complexity that’s going on in cloud pricing for enterprises, and to help service providers and tech vendors sell and price appropriately.Corey: Well, let’s define terms as well. One of the hardest ones is ‘cloud.’ In fact, the reason I called myself a cloud economist is because it was two words that no one could accurately define in almost any context. Cloud generally meant a bunch of people’s computers that weren’t yours, and economists generally meant someone who claimed to know everything about money but dressed like a disaster victim. And put the two together and no one had any idea what the hell I did, and in many cases, they still don’t, which is kind of ideal from my perspective.But when you say cloud, what does that mean? Are we talking Infrastructure as a Service? Are we moving up the stack into SaaS? Are we taking a Microsoft-ian definition and including LinkedIn revenue as part of their cloud unit for some godforsaken reason? What is it? Where do you start? Where do you stop?Owen: I mean, when we both started looking at cloud economics, it was all about the infrastructure because people wanted virtual machines and storage, and there was a huge amount of complexity there. But I think as the market has become more mature, increasingly we’re seeing people want to use all these other services, up to the platform level and the software level. So, I’m interested primarily in infrastructure and platform, but the thing about cloud providers nowadays is, there doesn’t seem to be any barriers to where they want to go. And I think you and I and others who are in this field are going to have to broaden our horizons and start thinking about everything because cloud is becoming the center of all IT, really.Corey: What I find strange is that as the further I go afield from my core competency in this space, which is looking at the AWS side, Infrastructure as a Service spend—which is not small in most companies and not getting any smaller—as soon as you start diverging from there, the requests that I start seeing from customers are all over the map. Some of them are trying to work on their Microsoft licensing. Others are trying to optimize some random SaaS tool’s billing because that’s top of mind at some point. It immediately shatters into 1000 different niches. But the common thread that I’ve always found was the AWS bill. And let me be very clear: that’s a function of who I talk to in my market in which I live, here in San Francisco. That does not mean that AWS is in every type of company of every profile; just the ones I started talking to and figured out that I could help.Owen: Yeah. That would make sense to me. I think COVID particularly, has made companies realize that cloud is an option. So, even though not everybody was using the cloud hyperscalers one or two years ago, I think over the past year, even if you weren’t dabbling in the cloud, perhaps you’ve started to play around. And even though optimization might be the first thing you think of when you start using the cloud, as time goes on and things start getting carried away, then that’s when this optimization is going to become more important.So, for example, we found 53% of enterprises we surveyed are using more cloud services as a result of the pandemic because they’ve had to change things. And even though they’ve probably spun it up really quickly because they’ve needed to grab hold of it quickly and take hold of the opportunity as soon as they could, in years to come there’s going to be so much complexity, and they’re all going to have different requirements—as you said—because they’re all using things in different ways to address their different needs as the pandemic has gone on.Corey: Talk to me a little bit about the COVID spike that you’re seeing. Is that people spinning up a bunch of new VMs? Is it people leveraging different video conferencing services, like Zoom or—God forbid—WebEx? Is it something else entirely?Owen: I think there’s two areas, and pretty much what you said. So, there’s some companies who are scaling their existing cloud resources. So, those who have built scalable applications, they’re just adding more and more virtual machines or auto-scaling so that they can keep up with demand. And as you said, that could be something like video conferencing, authentication, VDI, anything like that. But then there’s also companies who have had to really rapidly change business model because a year ago, they weren’t selling things online, they weren’t able to deliver, everything was bought in a shop.And now they’ve had to rapidly get access to cloud resources and rebuild their businesses, almost. And I think there’s lots of new cloud users as a result of the pandemic who thought, “How are we going to suddenly get this infrastructure? Or we’re going to have to go to whenever hyperscalers to get used to it, get hold of it right now.”Corey: One of the things that caught me by surprise, in the early days of the pandemic was a number of different companies whose business models weren’t really extending to online things in the same way. They were all based around real-world, physical commerce, for lack of a better term, saw their user traffic and their e-commerce traffic fall off a cliff, but their infrastructure spend remained relatively stable. At which point we realize, “Oh, interesting, when everyone talks about being able to auto-scale, they just mean up.” And that makes sense on some visceral level because if you don’t scale up, you drop customers on the floor. If you don’t scale down, well, it just costs a little bit of extra money and that’s not the end of the world, comparatively. And suddenly seeing people in somewhat dire straits, in a company context, and having to renegotiate their commits with different providers was something of an eye-opener.Owen: Yeah, yeah. And I identified this term, which is ‘cloud entropy.’ And I came up with a term called the ‘Law of Cloud Entropy.’ And what that essentially means is, cloud is only likely to get more disordered over time because most enterprises, as you said, would rather just leave things running, would rather scale up than risk scaling down because if they scale up, it means their applications can still continue to run, they’re not going to be shot in the foot by a server going down, but if they’re scaling down too quickly, then they’ve got a lot to lose that only a tiny cost saving to make. So, it’s almost like the cloud model is inherently risky, in terms of costs running away with it because things can happen automatically and there’s so much to lose by getting it wrong.Corey: Absolutely. After your cost-saving exercise winds up causing an outage, you’re generally not allowed to save money anymore.Owen: Yeah. Yeah, totally. Like, why risk saving a few cents, when it could bring your business down? But that’s the thing. I mean, it's not just a few cents anymore, is it? Because over time, people consume more and more resources, things aren’t being managed correctly; it’s really easy for those costs to spiral out of control.And that is not just a few cents. It’s thousands, tens of thousands of dollars. And there is a point where you think, “Well, actually, I am going to have to do this now because costs are spiraling, and it is time to take that step into optimizing and cutting my costs.”Corey: I’ve got a level with you, it does not stop at tens of thousands of dollars. Many of my clients wish it did. “Sure, we can eat that no problem.” It becomes something so much deeper, and it grows without any bounds on it. If you spin up an instance with the idea that you’ll just experiment on something and then turn it off in a couple of days. If you don’t proactively turn it off yourself, you’re going to retire before that instance does. It’ll sit there costing you, every hour of the day.Owen: Well, I’ve done that, as I’m sure you have. I’ve sped up a virtual machine, played around, and then six months later, I’ve been like, “Oh, this is surprisingly expensive.” And then I’m like, “Oh, well. I’m not going to be able to expense this, am I?” It’s only going to get worse.So, 49% of enterprises we survey say that cost savings are going to be a greater priority since COVID. So ironically, even though people are using more cloud, and perhaps these costs are spiraling out of control a bit, the fact of the matter is, they’ve never been under more pressure to try and quell it.Corey: Oh, yeah. And it doesn’t get any easier when people look at these things in their own right. And it doesn’t lend itself to easy analysis, especially as you start getting into large swings, you have seasonal cycles, you have people buying reserved instances, or savings plans, or whatever the other provider equivalent is in bulk at certain times of the year. And it’s very difficult to do accurate projections, especially when you don’t know the answer to a number of very pressing business questions. It almost becomes, in my case, marriage counseling between Finance and engineering.Owen: That’s such a shrewd observation because I think there is this huge disjoint between IT and Finance. And I can’t really see that being solved anytime soon because they’re both—it’s Finance’s job is to save money, but IT is to keep things ticking over and to innovate. And unfortunately, it’s a compromise. But to get a compromise when neither party really understands each other’s field is really tricky.Corey: Absolutely. If AWS were to somehow wave a magic wand and fix their billing—and, my God, I wish they would—I still have a business here. I still have credibility when talking to a customer about, “Is this the right level of spend? The right level of commit?” That you’re never going to have when your email address ends in the same domain as the vendor’s. And the ability to help them negotiate what those commits look like with that vendor is one of those business models that never goes out of style.Owen: Yeah, there’s always going to be that negotiation. Although I think it’s not as big as it used to be. So, when it was, like, server hardware 20 years ago, the list price was nothing like the price that you’d actually pay. Whereas in cloud, I think, I don’t know if you agree, but the variation seems far smaller to me. It almost seems 10 to 15% rather than the 50 to 60% it might have been 20 years ago.Corey: At certain points of scale, that no longer holds true.Owen: Interesting. Interesting.Corey: Not to name names, or specify numbers. Again, confidentiality matters. But at some point, when you wind up being a significant portion of a given service’s revenue, again, no one is paying retail, or anything even close to retail, at a certain point.Owen: And things are only going to get more complicated. So, we track all the things for sale from AWS, Google, Microsoft, and every week, we scan, now, 2 million individual line items for sale from those cloud providers. So, even if there was some kind of standardization list price with everything, that’s not going to apply to all of those different line items. So, I think for people like Duckbill, a lot of the need is to look at this whole bill, look at everything that’s being used for opportunities to optimize and negotiate, not just on the handful of services, which most enterprises are using.Corey: When you say that you can consume all of those pieces of information in a single week, that tells me you’re doing some definite data crunching and big number processing, largely because it’s impossible to get that much clarity within a week. Do you find that the cloud providers themselves change pricing—other than on things like preemptible instances or the spot market—without an announcement?Owen: Interesting. So, the Cloud Price Index, which I manage, essentially, every week we look at the websites of all these cloud providers, we go through their APIs, and we look at every price item they have and we compare it to the week before. And sometimes prices just go up and down just like a blip. It’s almost something’s gone wrong on the website or the API. But in 2020, we saw 4000 significant price cuts.So, a significant price cut is one that is greater than 10%. So, sometimes prices go down over time and the cloud providers don’t make a big song-and-dance about it anymore. But other times prices do go up, and those prices seem to go up, in particular, when a product goes from almost a beta into general availability. And different cloud providers do it in different ways. But yeah, I think prices are almost continually changing, and it’s almost like a sea of prices rather than thinking, “Oh, well, everything’s going down, or some things are going up.”Things going up and down all the time and it’s tricky to really know what’s going on. I think this is why cost optimization is going to be needed on an ongoing basis. Because it’s not just a one-off thing anymore, or where you go and buy a bunch of reserved instances. You need to be constantly reassessing this all the time. And, like, we were talking about the synergy between IT and Finance, you need to work out what the company is going to be doing in a year’s time so if it’s worthwhile investing in something to make those savings.Corey: When you say that they’re thousands of price changes, generally decreases, are they often correlated—in other words, if, “We’re going to be reducing the cost of the X instance family. The end.” But then there’s thousands of SKUs on some cases because they’re in all of the different regions, they have all the different pricing for the committed price, the reserve price, et cetera, et cetera, or are they making large-scale cuts and just not mentioning it? Because there was a time on the AWS side, which is where I live, where they would trumpet every minor cost reduction in some far-flung region for some service that basically no one used.Owen: You’re right. A lot of those cuts are because of a family, or a particular region, or a generation. And obviously, one cut translates to thousands of individual line items, which again, shows the complexity for companies to deal with because they’ve got to understand that one change can affect a whole range of different things. It’s not just one change anymore; it’s tens of thousands.Corey: What I hope is that, at some point, we’re going to start seeing something approaching commoditization in the space, but the price that has never materially changed—well, that’s unfair. The price that has generally never materially changed has been the egress fee for data transfer.Owen: See, I don’t think we’re going to reach commoditization for a long time yet. And I think of it as a gas station analogy. So, if there’s a bunch of gas stations all on the same road, we all know that the cheapest gas station will be the one that probably gets most of the business because people are only buying gas. But the reality is, people go to gas stations for loads of different things. They go because one might have a nice restaurant; one might sell different chocolates and candy.So, it’s not really about the commoditized offering of the gas. There’s loads of other things that would drive why you might choose one gas station over another. And I think that is the same with cloud providers. That yeah, they all might get similar prices for virtual machines at some point. But still, there’s going to be a reason why you might choose AWS, or Google, or Microsoft, or Oracle, or IBM, or Alibaba, or any of these folks. It’s going to be because of their whole portfolios and everything else they offer in trust and reliability, and regional access, and not just that single commodity price point which is their core business.Corey: Part of the problem is, at least in my experience, when I look at the customer profile that I tend to engage with, they have the bulk of their expenses, across a very small number of services, almost always EC2, RDS, S3, Elastic Block Store, and data transfer. And everything else is, sort of, a bit of a rounding error. There are always going to be exceptions on this, but what that tells me is that despite all the high-level services that get trumpeted, and despite the flashy abilities, and capabilities, and savings opportunities, et cetera, et cetera, that get trotted out, during all the provider keynotes, people are still largely using this to run virtual machines and store data. Is that a fair assessment from what you’re seeing?Owen: I would strongly agree with you. And it’s because people know how to build applications on servers. There’s different skills, but people have got the skills already to some degree. Whereas if you want to use serverless, or these new analytics tools, or IoT, or machine learning, that’s a whole new skill set. I’m with you; I think the bulk of it is still the basic infrastructure items.Corey: It really seems to be. And I can’t shake the feeling that as much as they want to give attention to the new stuff, it’s not a massive driver of people who are debating adopting the cloud. I really don’t think that it’s going to change anytime soon. If we take a look at AWS that has an annualized $51 billion run rate, and revenue at this point, which is just astonishing, it’s pretty clear that the next $51 billion is not going to come from the same customer profile. If anything, it’s going to come from what looks an awful lot like blue-chip companies, some of whom are in manufacturing, some of whom are in logistics, et cetera, et cetera.They’re not web properties; they’re not Netflix-style companies. And to meet those people where they are, they have to embrace the edge a lot more closely, they have to tell a story where you can manage the data center and the cloud environment similarly. And if anything, it’s going to increase that trend, not decrease it.Owen: How do you feel about multi-cloud, mate?Corey: I was hoping we would get there.Owen: [laugh].Corey: I have thoughts on the matter, but I will do you the service of letting you start.Owen: So, [laugh]. So, 59% of enterprises we talk to are pursuing a hybrid approach to IT. So, what that means in our language is, essentially enterprises want to make sure they can use different cloud providers. And the top reason they want to do that is because they want to choose which is the best expertise from each individual cloud provider. So yeah, they might want a cloud provider A because they’ve got really cheap infrastructure, yadda, yadda, yadda.But they still want to have the freedom to use cloud provider B because they’ve got these cool, sexy new analytics and stuff. And for me, I think the hyperscalers almost have to have these newer sexier services, not necessarily because lots of companies are going to use them and it’s going to erode all the commodity business, but more because if they don’t have them, it almost appears like a bit of a weakness because their competitors all have the same thing. And considering enterprises are so willing to consider multiple cloud environments, I think that more appropriately shows that you have to have these things because companies will look elsewhere if you don’t.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: I’m not going to disagree with what you’re saying because I’m not sure of the direction it’s going to go in, yet. I’ve often been mischaracterized when I rant about multi-cloud being a worst practice that I’m saying that you should absolutely pick one provider and go all in. Full stop. And that’s never been what I intended to say. For example, personally, all my infrastructure lives on AWS, give or take a few things that are hosted WordPress, for example.But my Git repositories live on GitHub because code commit is a funny joke that people just haven’t realized as a joke yet. And I use G Suite for email and the rest because work mail and work docs are services that even now, you’re not sure I’m not making up. And that’s the way that I tend to view the multi-cloud story: different workloads in different places. And that tends to be fine because in this case, there’s not a whole lot of interaction between those things. The dumb version of multi-cloud, to my world—and I think you called it hybrid in some respects—is the idea of, “I’m going to take a workload that can seamlessly go to any different cloud provider at any time.” And in practice, it never does that, and it also winds up trading off a lot of the benefit of going to public cloud in the first place.Owen: Yeah. That makes sense to me. And actually, in our data, that’s exactly what we found. So, it’s something like the average number of clouds used by the enterprises we survey is 2.2 on average, but 80% of their workloads are deployed in one cloud.So, I think you’re right; it is almost an aspiration. It’s just keeping your option open. Having the ability to move workloads between clouds constantly, we don’t see it either. It’s more about just having the ability to if you really needed to. Do you think some of that is because people are just scared of that lock-in, in your experience? Is it more of a psychological worry, than actual—a worry in reality?Corey: Partially that. It’s also that there’s a vendor ecosystem where if you’re selling a shared control plane that can speak to all three of the primary tier-one cloud providers, and people aren’t using multiple cloud providers, you suddenly have nothing left to sell them. It’s also being sold, in many respects, by cloud providers who are painfully aware that if you go all-in on one cloud, it will not be theirs.Owen: Mmm. Yeah, makes sense. But it’s been interesting over the past year or so how the hyperscalers have started talking mature about hybrid and multi-cloud. So, five years ago, I didn’t think AWS would ever have something like Outposts. And also their competitors. So Google, have Anthos where you can move workloads, Microsoft came out with Arc. So, it’s surprising to me that they’re all embracing this concept so readily.Corey: Well, I do want to call out that there is a distinct difference in my mind, between using multiple cloud providers and having a hybrid structure where you have a data center and a cloud provider because everyone goes through a migration process there. In fact, a failed cloud migration is called, “We’re hybrid now,” because it turns out midway through, it’s super hard to move something so you give up and declare victory. No one generally sets out to live permanently with a foot in each world. What invariably happens then is they improve their data center at the expense of their cloud environment. And they really tend to treat the cloud more or less as an incredibly expensive place just to run a bunch of virtual machines, compared to what they would get economically on-prem. Now, that said, the raw infrastructure cost is only a small part of the story.Owen: Yeah because you’ve got the labor cost of running it yourself as well, right?Corey: Which is always more expensive, than the infrastructure. It’s an incredible rarity when we see the AWS bill costing more than payroll.Owen: Then again, I think, you know, it’s not just the cost savings of having some of your cloud stuff on-prem, though, is it? I mean, the world is a complex place at the moment, pandemic, politics. I think some buyers like to have their data somewhere where they know, in a country they understand the compliance and the sovereignty. I mean, even though cloud is an easily accessible place, you still don’t have ownership of it all. There’s some things you just want to keep close to home. And that is a lot of the driver we see for the hybrid model. The public cloud gives the flexibility but the on-prem cloud lets you still have some flexibility, but keep it all controlled and in your own arms.Corey: To be very clear, I’m also speaking in the very general case. When I talk to individual clients who have made a different decision, my default assumption there is that they’ve thought through these things and have a reason for things being the way that they are. My problem—and why I started making noise about this topic—is that, in my experience, no one else was saying it, which means that if you don’t really know what’s going on and you listen to just the vendor hype, then you would think that, “Oh, I absolutely must build everything that I’m doing in the cloud to work on multiple providers on day one.” And that’s just not the case.Owen: I see what you mean. Yes, that’s not the case. Just because people are using multiple venues doesn’t mean they’re all necessarily working together in any sensible way.Corey: Exactly. This is part of the reason I have no partnerships with any vendor in this space. It’s the reason I don’t charge percentages of things. It never goes well. I wind up charging fixed-fee to my clients and then I tell them to do what I would do in their position. And I’ll explain my logic as I go, and everyone’s generally pretty happy with that.Owen: Mmm. Makes sense.Corey: For better or worse, it seems to solve the problem that folks have. But it’s a growing market; I’m never going to be able to talk to more than a very small percentage of it, and this problem has to be solved, on some level, systemically. Because if we look at cloud spend as an unbounded growth problem, well, first, it means that in the cloud business is a great place to be if you’re one of the ones that’s making money at it. But it also means that at some point, there’s going to be some kind of a reckoning where people need to go back and play cloud environment archaeologist. And this isn’t just a big company problem.I’m the only person that was in most of our early accounts here at The Duckbill Group and I have to figure out what that idiot moron known as my past self was thinking when he tried to build some of this nonsense. And the short answer is, he had no idea, but it seemed like a good idea at the time.Owen: I totally love that: ‘cloud bill archaeologist’ and I will be stealing that for a future reference. And I think you’re right, even during the pandemic, I bet loads of people have scaled up straightaway, thinking, “Well, we’ve got to capture the opportunity now, or we’re going to risk losing business.” And no one’s really planned it or looked at it for a year because, quite understandably, they’ve had bigger things to worry about. And in five years’ time, no one’s going to know what’s going on, what workload is tied to what specific application, who owns it. And the thought of even understanding it is challenging, let alone trying to optimize it.And I was having a debate with my colleague, Jean Atelsek, today, and she was asking me if I thought one day, this could all be automated away. And I don’t think it can be automated away because there still needs to be someone who understands the business, to understand scaling, to understand if something’s worth an investment, to understand if you should scale up or down in response to a specific demand or project. So, I always think there’s going to be some kind of human intervention, just because humans will understand the needs of the business and relate them to how the cloud has to change.Corey: The only other approach as I see it, than my own are, “Oh, we’re going to build some tools that will solve all of this for you.” And they just don’t work. That’s terrific to wind up finding specific things, absolutely, but there’s no context to them. There’s no idea of, “Should I optimize for this cluster for the long haul, or should I instead wind up focusing on it as this thing that I should immediately ignore?” As soon as you start getting three or four terrible recommendations in a row, you wind up in a space of not trusting the tool at all. Bad recommendations are worse than no recommendations.Owen: So, why do you think that is? Why do you think the tools—I suppose the tools can predict the future so they don’t know the context of what needs to be done. Are there any other reasons to see those tools has not been able to adapt? Why do you not think tools will have a longer-term impact.Corey: Because in many cases, there’s no way to tell from a programmatic perspective. “Those idle instances that are sitting there? I’m going to recommend that they get turned off.” Well, a little more digging shows that they’re the DR site and you need three seconds of warning or so before they’re going to be under load. You can’t turn them off.Whereas, buy a bunch of reserved instances on that particular cluster that someone just spun up for a one-week experiment and then they’re turning it off, doesn’t make a lot of sense, either. And as you step down this path, it becomes nuanced. There are times where that is this tiny little test environment, so no one is going to look at it or care. Except that that tiny little test environment is about to go hyperscale once the business deal gets signed, so now is absolutely the time to optimize stuff like that. There’s the idea of well, this data could be migrated to infrequent access one zone, and it winds up costing less money. Cool. That’s true, but if that data goes away, it winds up effectively destroying aspects of the business.So, in that case, you should spend more money on backing that data up securely, in many cases, to another cloud provider. That’s the level of nuance. There’s a whole bunch of different things that a naive approach would suggest would be a good idea. But a deeper dive into what the business is actually doing and the model that they’re working under, make it the wrong direction to go.Owen: I strongly agree.Corey: And it gets worse than that because there’s this false narrative that companies care tremendously about saving money on the bill; that’s the thing that drives them. And it is just not true. Because it’s an inversion of monetary philosophy that people take on a personal level. If I offer you the opportunity, you can either make another $1,000, or save $1,000, you’re typically going to say you’d rather save the money because, well, you can cut Netflix out, you can stop eating out, and that works out well, whereas having to go ahead and make more money, that means you have to ask your boss for a raise and start doing odd jobs and update your resume, and it’s just a pain. Companies, on the other hand, are structured to drive revenue. There’s a theoretical cap of whatever they’re spending on cloud in total, that they’ll ever be able to cut off, but they can make multiples of that by launching the right feature to the right market at the right time.Owen: I wonder if cost optimization is perhaps the wrong word and it should be something along the lines of value optimization because obviously, I don’t want any company reducing their virtual machines to save money because as you said, it’s going to reduce their opportunities to gain new revenue if it’s their web applications. Really, it’s about, “Well, this is where you should be putting your money, and this is where you’re wasting it.” It’s about optimizing their value, not saving them their costs.Corey: Precisely. It comes down to what’s right for them, given the constraints that they’re working under. And again, it’s easy to go ahead and play, more or less, Wild West architecture, where you look at what they’re doing and say, “Oh, yeah. This is all wrong, you should be doing it this way.” And you sketch out a beautiful architecture on a whiteboard—also known as a lie—and, yeah, in theory, it’s great.In practice, they have existing business, it’s driving revenue, and you’re not going to be allowed to turn everything off for 18 months while you rebuild it. The money that you save doesn’t matter if you’re not in business by the time you’re in a position to realize those savings.Owen: And perhaps after COVID, there will be loads of these servers and virtual machines and objects on object storage which are left there, just because it’s not really worth removing them. Because nobody knows what they are, it might bring down the whole business. Better just leave them there for the time being.Corey: Well, that does bring up the last topic I wanted to bounce off of you. What is the outcome of all of this COVID stuff, once it is all past? What is the lasting after effect, if any, of COVID on cloud?Owen: I think COVID will be a catalyst for cloud adoption. Some companies have changed their business models; they’ve aged collaboration; they’ve been able to change their businesses in a matter of weeks. And that’s been enabled because of the rapid scalability of cloud because they’ve been able to get a third-party to do physical server management and because they’ve been able to concentrate on changing and evolving their businesses instead of worrying about infrastructure. I think those who have succeeded by doing that are likely to keep doing that because it puts them in good stead during the past challenging eight months. And those who hadn’t done that will now think, “Well, perhaps we should have done that.” And again, they’ll look at the cloud as a way of moving forward. So COVID, although horrifically terrible for so many people, will probably be a catalyst for cloud adoption, and has demonstrated to the industry that cloud is a suitable venue for many, many workloads.Corey: Owen, thank you so much for taking the time to speak with me and suffer my, I guess, less educated slash informed opinions on cloud economics. If people want to hear more about what you have to say, where can they find you?Owen: So, you can find me on 451Research.com, or on Twitter; I’m @owenrog.Corey: And we will of course, put links to that in the [show notes 00:33:43]. Thank you so much for taking the time to speak with me. I really appreciate it.Owen: No, thank you very much.Corey: Owen Rogers, research director, and cloud economist at 451, Division of S&P Global. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment listing all 4000 prices that changed.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.This has been a HumblePod production. Stay humble.
About ChaddChadd Kenney is the Vice President of Product at Clumio. Chadd has 20 years of experience in technology leadership roles, most recently as Vice President of Products and Solutions for Pure Storage. Prior to that role, he was the Vice President and Chief Technology Officer for the Americas helping to grow the business from zero in revenue to over a billion. Chadd also spent 8 years at EMC in various roles from Field CTO to Principal Engineer. Chadd is a technologist at heart, who loves helping customers understand the true elegance of products through simple analogies, solutions use cases, and a view into the minds of the engineers that created the solution.Links: Clumio: https://clumio.com/ Clumio AWS Marketplace: https://aws.amazon.com/marketplace/pp/prodview-ifixh6lnreang TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications.It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Periodically, I talk an awful lot about backups and that no one actually cares about backups, just restores; usually, they care about restores right after they discover they didn’t have backups of the thing that they really, really, really wish that they did. Today’s promoted guest episode is sponsored by Clumio. And I’m speaking to their VP of product, Chadd Kenney. Chadd, thanks for joining me.Chadd: Thanks for having me. Super excited to be here.Corey: So, let’s start at the very beginning. What is a Clumio? Possibly a product, possibly a service, probably not a breakfast cereal, but again, we try not to judge.Chadd: [laugh]. Awesome. Well, Clumio is a Backup as a Service offering for the enterprise, focused in on the public cloud. And so our mission is, effectively, to help simplify data protection and make it a much, much better experience to the end-user, and provide a bunch of values that they just can’t get today in the public cloud, whether it’s in visibility, or better protection, or better granularity. And we’ve been around for a bit of time, really focused in on helping customers along their journey to the cloud.Corey: Backups are one of those things where people don’t spend a lot of time and energy thinking about them until they are, I guess, befallen by tragedy in some form. Ideally, it’s something minor, but occasionally it’s, “Oh, yeah. I used to work at that company that went under because there was a horrible incident and we didn’t have backups.” And then people go from not caring to being overzealous converts. Based upon my focus on this, you can probably pretty safely guess which side of that [chasm 00:02:04] I fall into. But let’s start, I guess, with positioning; you said that you are backup for the enterprise. What does that mean exactly? Who are your customers?Chadd: We’ve been trying to help customers into their cloud journey. So, if you think about many of our customers are coming from the on-prem data center, they have moved some of their applications, whether they’re lift-and-shift applications, or whether they’ve, kind of, stalled doing net-new development on-prem and doing all net new development in the public cloud. And we’ve been helping them along the way and solving one fundamental challenge, which is, “How do I make sure my data is protected? How do I make sure I have good compliance and visibility to understand, you know, is it working? And how do I be able to restore as fast as possible in the event that I need it?”And you mentioned at the beginning backup is all about restore and we a hundred percent agree. I feel like today, you get this [unintelligible 00:02:51] together a series of solutions, whether it’s a script, or it’s a backup solution that’s moved from on-prem, or it’s a snapshot orchestrator, but no one’s really been able to tackle the problem of, help me provide data protection across all of my accounts, all of my regions, all of my services that I’m using within the cloud. And if you look at it, the enterprise has transitioned dramatically to the cloud and don’t have great solutions to latch on to solve this fundamental problem. And our mission has been exactly that: bring a whole bunch of cool innovation. We’re built natively in the public cloud; we started off on a platform that wasn’t built on a whole bunch of EC2 instances that look like a box that was built on-prem, we built the thing mostly on Lambda functions, very event-driven. All AWS native services. We didn’t build anything proprietary data structure for our environment. And it’s really been able to build a better user experience for our end customers.Corey: I guess there’s an easy question to start with, of why would someone consider working with Clumio instead of AWS Backup, which came out a few months after re:Invent, I want to say 2018, but don’t quote me on that; may have been 2019. But it has the AWS label on the tin, which is always a mark of quality.Chadd: [laugh]. Well, there’s definitely a fair bit to be desired on the AWS Backup front. And if you look at it, what we did is we spent, really, before going into development here, a lot of time with customers to just understand where those pains are. And I’ve nailed it, kind of, to four or five different things that we hear consistently. One is that there’s near zero insights; “I don’t know what’s going on with it. I can’t tell whether I’m compliant or not compliant, or protecting not enough or too much.”They haven’t really provided sufficient security on being able to airgap my data to a point where I feel comfortable that even one of my admins can’t accidentally fat-finger a script and delete, you know, whether the primary copy or secondary copy. Restore times have a lot to be desired. I mean, you’re using snapshots. You can imagine that doesn’t really give you a whole bunch of fine-grained granularity, and the timeframe it takes to get to it—even to find it—is kind of a time-consuming game. And they’re not cheap.The snapshots are five cents per gig per month. And I will say they leave a lot to be desired for that cost basis. And so all of this complexity kind of built-in as a whole has given us an opportunity to provide a very different customer experience. And what the difference between these two solutions are is we’ve been providing a much better visibility just in the core solution. And we’ll be announcing here, on May 27, Clumio Discover which gives customers so much better visibility than what AWS Backup has been able to deliver.And instead of them having to create dashboards and other solutions as a whole, we’re able to give them unique visibility into their environment, whether it’s global visibility, ensuring data is protected, doing cost comparisons, and a whole bunch of others. We allow customers to be able to restore data incredibly faster, at fine-grained granularities, whether it’s at a file level, directory level, instance level, even in RDS we go down to the record level of a particular database with direct query access. And so the experience just as a whole has been so much simpler and easier for the end consumer, that we’ve been able to add a lot of value well beyond what AWS Backup uses. Now, that being said, we still use snapshots for operational recovery at some level, where customers can still use what they do today but what Clumio brings is an enhanced version of that by actually using airgap protection inside of our service for those datasets as well. And so it allows you to almost enhance AWS Backup at some level if you think about it. Because AWS Backups really are just orchestrating the snapshots; we can do that exact same thing, too, but really bring the airgap protection solution on top of that as well.Corey: I’ve talked about this periodically on the show. But one of the last, I guess, big migration projects I did when I was back in my employee days—before starting this place—was a project I’d done a few times, which was migrating an environment from EC2-Classic into a VPC world. Back in the dark times, before VPCs were a thing, EC2-Classic is what people used. And they were not just using EC2 in those environments, they were using RDS in this case. And the way to move an RDS database is to stop everything, take a final snapshot, then restore that snapshot—which is the equivalent of backup—to the new environment.How long does that take? It is non-deterministic. In the fullness of time, it will be complete. That wasn’t necessarily a disaster restoration scenario, it was just a migration, and there were other approaches we theoretically could have taken, but this was the one that we decided to go with based upon a variety of business constraints. And it’s awkward when you’re sitting there, just waiting indefinitely for, it turns out, about 45 minutes in this case, and you think everything’s going well, but there’s really nothing else to do during those moments.And that was, again, a planned maintenance, so it was less nerve-wracking then the site is down and people are screaming. But it’s good to have that expectation brought into it. But it was completely non-transparent; there was no idea what was going on, and in actual disasters, things are never that well planned or clear-cut. And at some level, the idea of using backup restoration as a migration strategy is kind of a strange one, but it’s a good way of testing backups. If you don’t test your backups, you don’t really have them in the first place. At least, that’s always been my philosophy. I’m going to theorize, unless this is your first day in business, that you sort of feel the same way, given your industry.Chadd: Definitely. And I think the interesting parts of this is that you have the validation that backups occurring, which is—you need visibility on that functioning, at some level; like, did it actually happen? And then you need the validation that the data is actually in a state that I can recover—Corey: Task failed successfully.Chadd: [laugh]. Exactly. And then you need validation that you can actually get to the data. So, there’s snapshots which give you this full entire thing, and then you got to go find the thing that you’re looking for within it. I think one of the values that we’ve really taken advantage of here is we use a lot of the APIs within AWS first to get optimization in the way that we access the data.So, as an example—on your EC2 example—we use EBS direct APIs, and we do change block tracking off of that, and we send the data from the customers tenancy into our service directly. And so there’s no egress charges, there’s no additional cost associated to it; it just goes into our service. And the customer pays for what they consume only. But in doing that, they get a whole bunch of new values. Now, you can actually get file-level indexing, I can search globally for files in an instance without having to restore the entire thing, which seems like that would be a relatively obvious thing to get to.But we don’t stop there. You could restore a file, you could go browse the file system, you could restore to an AMI, you could restore to another EC2 instance, you could move it to another account. In RDS, not an easy service to protect, I will say. You know, you get this game of, “I’ve got to restore the entire instance and then go find something to query the thing.” And our solution allows you direct query access, so we can see a schema browser, you can go see all of your databases that are in it, you can see all the tables, the rows in the table, you can do advanced queries to join across tables to go [unintelligible 00:10:00] results.And that experience, I think, is what customers are truly looking forward to be able to provide additional values beyond just the restoration of data. I’ll give you a fun example that a SaaS customer was using. They have a centralized customer database that keeps all of the config information across all of the tenants.Corey: I used to do something very similar with Route 53, and everyone looks at me strangely when I say it, but it worked at the time. There are better approaches now. But yeah, very common pattern.Chadd: And so you get into a world where it’s like, I don’t want to restore this entire thing at that point in time to another instance, and then just pull the three records for that one customer that they screwed up. Instead, it would be great if I could just take those three records from a solution and then just imported into the database. And the funny part of this is that the time it takes to do all these things is one component, the accidentally forgetting to delete all the stuff that I left over from trying to restore the data for weeks at a time that now I pay for in AWS is just this other thing that you don’t ever think about. It’s like, inefficiencies built in with the manual operations that you build into this model to actually get to the datasets. And so we just think there’s a better way to be able to see and understand datasets in AWS.Corey: One of my favorite genres of fiction is reading companies’ DR plans for how they imagine a disaster is going to go down. And it’s always an exercise in hilarity. I was not invited to those meetings anymore after I had the temerity to suggest that maybe if the city is completely uninhabitable and we have to retreat to a DR site, no one cares about this job that much. Or if us-east-one has burned to the ground over in AWS land, that maybe your entire staff is going to go quit to become consultants for 100 times more money by companies that have way bigger problems than you do. And then you’re not invited back.But there’s usually a certain presumed scale of a disaster, where you’re going to swing into action and exercise your DR plan. Okay, great. Maybe the data center is not a smoking crater in the ground; maybe even the database is largely where; what if you lost a particular record or a particular file somewhere? And that’s where it gets sticky, in a lot of cases because people start wondering, “Do I just spend the time and rebuild that file from scratch, kind of? Do I do a full restore of the”—all I have is either nothing or the entire environment. You’re talking about row-level restores, effectively, for RDS, which is kind of awesome and incredible. I don’t think I’ve ever seen someone talking about that before. How does that map as far as, effectively, a choose-your-own-disaster dial?Chadd: [laugh]. There’s a bunch of cool use cases to this. You’ve definitely got disaster recovery; so you’ve got the instance where somebody blew something away and you only need a series of records associated to it; maybe the SQL query was off. You’ve got compliance stuff. Think about this for a quick sec: you’ve got an RDS instance that you’ve been backing up, let’s say you keep it for just even a year.How many versions of that RDS database has AWS gone through in that period of time so that when you go restore that actual snapshot, you’ve got to rev the thing to the current version, which would take you some time [laugh] to get up and running, before you can even query the thing. And imagine if you do that, like, years down the road, if you’re keeping databases out there, and your legal team’s asking for a particular thing for discovery, let’s say. And you’ve got to now go through all of these iterations to try to get it back. The thing we decided to do that was genius on the [unintelligible 00:13:19] team was, we wanted to decouple the infrastructure from the data. So, what we actually do is we don’t have a database engine that’s sitting behind this.We’re exporting the RDS snapshot into a Parquet file, and the Parquet file then gets queried directly from Athena. And that allows us to allow customers to go to any timeframe to be able to pull not-specific database engine data into—whether it’s a restore function, or whether I want to migrate to a new database engine, I can pull that data out and re-import it into some other engine without having to have that infrastructure be coupled so closely to the dataset. And this was, really, kind of a way for customers to be able to leverage those datasets in all sorts of different ways in the future, with being able to query the data directly from our platform.Corey: It’s always fun talking to customers and asking them questions that they look at me as if I’ve grown a second head, such as, “Okay. So, in what disaster scenario are you going to need to restore your production database to a state that was in nine months ago?” And they look at me like I’ve just asked a ridiculous question because, of course, they’re never going to do that. If the database is restored to a copy that backed up more than 15 minutes or so in the past, there are serious problems. That’s why the recovery point objective—or RPO—of what is your data window of loss when you do a restore is so important for these plannings.And that’s great. “Okay then, why do you have six years of snapshots of your database taken on an interval going back all that time, if you’re never going to ever restore to any of them?” “Well, something compliance.” Yeah. There are better stories for that. But people start keeping these things around almost as digital packrats, and then they wind up surprised that their backup bill has skyrocketed. I’m going to go out on a limb presume—because if not, this is going to be a pretty awkward question—that you do not just backup management but also backup aging as far as life cycles go.Chadd: Yeah. So, there’s a couple different ways that are fun for us is we see multiple different tiers within backup. So, you’ve got the operational recovery methodology, which is what people usually use snapshots for. And unfortunately, you pay that at a pretty high premium because it’s high value. You’re going to restore a database that maybe went corrupt, or got somehow updated incorrectly or whatever else, and so you pay a high number for that for, let’s say, a couple days; or maybe it’s just even a couple hours.The unfortunate part is, that’s all you’ve got, really, in AWS to play with. And so, if I need to keep long-term retention, I’m keeping this high-value item now for a long duration. And so what we’ve done is we’ve tried to optimize the datasets as much as possible. So, on EC2 and EBS, we’ll dedupe and compress the datasets, and then store them in S3 on our tenancy. And then there’s a lower cost basis for the customer.They can still use operational recovery, we’ll manage that as part of the policy, but they can also store it in an airgap protected solution so that no one has access to it, and they can restore it to any of the accounts that they have out there.Corey: Oh, separating access is one of those incredibly important things to do, just because, first, if someone has completely fat-fingered something, you want to absolutely constrain the blast radius. But two, there is the theoretical problem of someone doing this maliciously, either through ransomware or through a bad actor—external or internal—or someone who has compromised one of your staff’s credentials. The idea being that people with access to production should never be the people who have access to, in some cases, the audit logs, or the backups themselves in some cases. So, having that gap—an airgap as you call it—is critical.Chadd: Mm-hm. The only way to do this, really, in AWS—and a lot of customers are doing this and then they move to us—is they replicate their snapshots to another account and vault them somewhere else. And while that works, the downside—and it’s not a true airgap, in a sense; it’s just effectively moving the data out of the account that it was created in. But you double the cost, so that sucks because you’re keeping your local copy, and then the secondary copy that sits on the other account. The admins still have access to it, so it’s not like it’s just completely disconnected from the environment. It’s still in the security sphere, so if you’re looking at a ransomware attack, trust me, they’ll find ways to get access to that thing and compromise it. And so you have vulnerabilities that are kind of built into this altogether.Corey: “So-what’s-your-security-approach-to-keeping-those-two-accounts-separated?” “The sheer complexity that it takes to wind up assuming a role in that other account that no one’s going to be able to figure it out because we’ve tried for years and can’t get it to work properly.” Yeah, maybe that’s not plan A.Chadd: Exactly. And I feel like while you can [unintelligible 00:17:33] these things together in various scripts, and solutions, and things, people are looking for solutions, not more complexity to manage. I mean, if you think about this, backup is not usually the thing that is strategic to that company’s mission. It’s something that protects their mission, but not drives their mission. It is our mission and so we help customers with that, but it should be something we can take off their hands and provide as a service versus them trying to build their own backup solution as a whole.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: Back when I was an employee if I was being honest, people said, “So, what is the number one thing you’re always sure to do on a disaster recovery plan?” My answer is, “I keep my resume updated.” Because, on some level, you can always quit and go work somewhere else. That is honest, but it’s also not the right answer in many cases. You need to start planning for these things before you need them.No one cares about backups until right after you really needed backups. And keeping that managed is important. There are reasons why architectures around this stuff are the way that they are, but there are significant problems around how a lot of AWS implements these things. I wound up having to use a backup about a month or so ago when some of my crappy code deleted data—imagine that—from a DynamoDB table, and I have point-in-time restores turned on. Cool. So, I just roll it back half an hour and that was great. The problem is, there was about four megabytes of data in that table, and it took an hour to do the restore into a new table and then migrate everything back over, which was a different colossal pain. And I’m sure there are complicated architectural reasons under the hood, but it’s like, that is almost as slow as someone who’s retyped it all by hand, and it’s an incredibly frustrating experience. You also see it with EBS snapshots: you backup an EBS volume with a snapshot—it just copies the data that’s there. Great—every time there’s another snapshot taken, it just changes the delta. And that’s the storage it gets built to. So, what does that actually cost? No one really knows. They recently launched direct APIs for EBS snapshots; you can start at least getting some of that data out of it if you just write a whole bunch of code—preferably in a Lambda function because that’s AWS’s solution for everything—but it’s all plumbing solution where you’re spending all your time building the scaffolding and this tooling. Backups are right up there with monitoring and alerting for the first thing I will absolutely hurl to a third party.Chadd: I a hundred percent agree. It’s—Corey: I know you’re a third-party. You’re, uh, you’re hardly objective on this.Chadd: [laugh].Corey: But again, I don’t partner with anyone. I’m not here to shill for people. You can’t buy my opinion on these things. I’ve been paying third parties to back things up for a very long time because that’s what makes sense.Chadd: The one thing that I think, you know, we hit on at the beginning a little bit was this visibility challenge—and this was one of the big launch around Clumio Discover that’s coming out on May 27th there—is we found out that there was near-zero visibility, right? And so you’re talking about the restore times, which is one key component, but [laugh]—Corey: Yeah, then you restore after four hours and discover you don’t have what you thought you did.Chadd: [laugh]. And so, I would love to see, like, am I backing things up? How much am I paying for all of these things? Can I get to them fast? I mean, the funny thing about the restore that I don’t think people ever talk about—and this is one of the things that I think customers love the most about Clumio—is, when you go to restore something, even that DynamoDB database you talked about earlier, you have to go actually find the snapshot in a long scroll.So first, you had to go to the service, to the account, and scroll through all of the snapshots to find the one that you actually want to restore with—and by the way, maybe that’s not a monster amount for you, but in a lot of companies that could be thousands, tens of thousands of snapshots they’re scrolling through—and they’ve got a guy yelling at them to go restore this as soon as possible, and they’re trying to figure out which one it is; they hunt-and-peck to find it. Wouldn’t it be nice if you just had a nice calendar that showed you, “Here’s where it is, and here’s all the different backups that you have on that point in time.” And then just go ahead and restore it then?Corey: Save me from the world of crappy scripts for things like this that you find on GitHub. And again, no disrespect to the people writing these things, but it’s clear that people are scratching their own itch. That’s the joy of open-source. Yeah, this is the backup script—or whatever it is—that works on the ten instances I have in my environment. That’s great.You roll that out to 600 instances and everything breaks. It winds up hitting rate limits as it tries to iterate through a bunch of things rather than building a queue and working through the rest of it. It’s very clearly aimed at small-scale stuff and built by people who aren’t working in large-scale environments. Conversely, you wind up with sort of the Google problem when you look at solving it for just the giant environments. Great, that you wind up with this overengineered, enormously unwieldy solution. Like, “Oh yeah, the continental saw. We use it to wind up cutting continents in half.” It’s, “I’m trying to cut a sandwich in half here. What’s the problem here?”It becomes a hard problem. The idea of having something that scales and has decent user ergonomics is critically important, especially when you get to something as critical as backups and restores. Because you’re not generally allowed to spend months on end building a backup solution at a company, and when you’re doing a restore, it’s often 3 a.m. and you’re bleary-eyed and panicked, which is not the time to be making difficult decisions; it’s the time to be clicking the button.Chadd: A hundred percent agree. I think the lack of visibility, this being a solution, less a problem I’m trying to solve [laugh] on my own is, I think, one area no one’s really tackled in the industry, especially around data protection. I will say people have done this on-prem at a decent level, but it just doesn’t exist inside the public cloud today. Clumio Discover, as an example, is one thing that we just heard constantly. It was like, “Give me global visibility to see everything in one single pane of glass across all my accounts, ensure all of my data is protected, optimized the way that I’m spending in data protection, identify if I’ve got massive outliers or huge consumers, and then help me restore faster.”And the cool part with Discover is that we’re actually giving this away to customers for free. They can go use this whether they’re using AWS Backup or us, and they can now see all of their environment. And at the same time, they get to experience Clumio as a solution in a way that is vastly different than what they’re experiencing today, and hopefully, they’ll continue to expand with us as we continue to innovate inside of AWS. But it’s a cool value for them to be able to finally get that visibility that they’ve never had before.Corey: Did, you know, that AWS users can have multiple accounts and have resources in those accounts in multiple regions?Chadd: Oh, yeah. Lots of them.Corey: Yeah. Because—the reason that you know that, apparently, is that you don’t work for AWS Backup where, last time I checked, there are still something like eight or nine regions that they are not present in. And you have to wind up configuring this, in many cases, separately, and of course, across multiple accounts, which is a modern best practice: separate things out by account. There we go. But it is absolutely painful to wind up working with.Sure, it’s great for small-scale test accounts where I have everything in a single account and I want to make sure that data doesn’t necessarily go on walkabout. Great. But I can’t scale that in the same way without creating a significant management problem for myself.Chadd: Yeah, just the amount of accounts that we see in enterprises is nuts. And with people managing this at an account level, it’s unbearable. And with no visibility, you’re doing this without really an understanding of whether you’re successfully executing this across all of those accounts at any point in time. And so this is one of the areas that we really want to help enterprises with. It’s, not only make the protection simple but also validate that it’s actually occurring. Because I think the one thing that no one likes to talk about in this is the whole compliance game, right? Like—Corey: Yeah, doing something is next to useless; you got to prove that you’re doing the thing.Chadd: Yeah. I got an auditor who shows up once a quarter and says, “Show me this backup.” And then I got to go fumble to try to figure out where that is. And, “Oh, my God. It’s not there. What do I tell the guy?” Well, wouldn’t it be nice if you had this global compliance report that showed you whether you were compliant, or if it wasn’t—which, you know, maybe it wasn’t for a snapshot that you created—at least would tell you why. [laugh]. Like, an RPO was exceeded on the amount of time it took to take the snapshot. Okay, well, that’s good to know. Now, I can tell the guy something other than just make something up because I have no information.Corey: So, you’d have multiple snapshots in flight simultaneously; always a great plan. Talk to me a little bit about Discover, your new offering? What is it? What led to it?Chadd: I love talking to customers, for one, and we spend a lot of time understanding where the gaps exist in the public cloud. And our job is to help fill those gaps with really cool innovation. And so the first one we heard was, “I cannot see multiple services, regions, accounts in one view. I had to go to each one of the services to understand what’s going on in it versus being able to see all of my assets in one view. I’ve got a lot of fragment reporting. I’ve got no compliance view whatsoever. I can’t tell if I’m over-protecting or under-protecting.”Orphan snapshots are the bane of many people’s existence, where they’ve taken snapshots at some point, deleted an EC2 instance, and they pay monthly for these things. We’ve got an orphan snapshot report. It will show you all of the snapshots that exist out there with no EC2 instance associated to it, and you can go take action on it. And so, what Discover came from is customers saying, “I need help.” And we built a solution to help them.And it gives them actionable insights, globally, across their entire set of accounts, across various different services, and allows them to do a whole bunch of fun stuff, whether it’s actionable and, “Help me delete all my orphan snapshots,” to, “I’ve got a 30-day retention period. Show me every snapshot that’s over 30 days. I’d like to get rid of that one, too.” Or, “How much are my backups costing me in snapshots today?”Corey: Yeah, today, the answer is, “[mumble].”Chadd: [laugh]. And imagine being able to see that with, effectively, a free tool that gives you actionable insights. That’s what Discovery is. And so you pair that with Clumio Protect, which is our backup solution, and you’ve got a really awesome solution to be able to see everything, validate it’s working, and actually go protect it, whether it’s operational recovery, or a true airgap solution, of which it’s really hard to pull off in AWS today.Corey: What problem that’s endemic to the backup space is that from a customer perspective, you are either invisible, or you have failed them. There are remarkably few happy customers talking about their experience with their backup vendor. So, as a counterpoint to that, what do the customers love about you, folks?Chadd: So, first and foremost, customers love the support experience. We are a SaaS offering, and we manage the backups completely for the end-user; there’s no cloud infrastructure the customer has to manage. You know, there’s a lot of these fake SaaS offerings out there where I better deploy a thing and manage it in my tenancy. We’ve created an experience that allows our support organization to help customers proactively support it, and we become an extension to those infrastructure teams, and really help customers to make sure they have great visibility and understanding what’s going on in their environment. The second part is just a completely new customer experience.You’ve got simplicity around the way that I add accounts, I create a policy, I assign a tag, and I’m off and running. There’s no management or hand-holding that you need to do within the system. The system scales to any size environment, and you know, you’re off and running. And if you want to validate anything, you can validate it via compliance reports, audit reports, activity reports. And you can see all of your accounts, data assets, in one single pane of glass, and now with Clumio Discover, you get the ability to be able to see it in one single view and see history, footprint, and all sorts of other fun stuff on top of it. And so it’s a very different user experience than what you see in any other solution that’s out there for data protection today.Corey: Thank you so much for taking the time to speak with me today. If people want to learn more about Clumio and kick the tires for themselves, what should they do?Chadd: So, we are on AWS Marketplace, so you can get us up and running there and test us out. We give you $200 of free credits, so you can not only use our operational recovery, which is, kind of, snapshot management, similar database backup, which is free. You can check out Clumio Discover, which is also free, and see all of your accounts and environments in one single pane of glass with some awesome actionable insights, as we mentioned. And then you can reach out to us directly on clumio.com, where you can see a whole bunch of great content, blog posts, and the like, around our solution and service. And we’re looking forward to hearing from you.Corey: Excellent. And we will, of course, throw links to that in the [show notes 00:29:57]. Thank you so much for taking the time to speak with me today. I appreciate it.Chadd: Well, thank you so much for having me. I had an awesome time. Thank you.Corey: Chadd Kenney, VP of product at Clumio. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a very long-winded comment that you accidentally lose because the page refreshes, and you didn’t have a backup.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
loading
Comments (1)

Felipe Alvarez

it seems the volume changes from high to low every few seconds. please fix?

Jun 10th
Reply
Download from Google Play
Download from App Store