Software at Scale

59 Episodes

Reverse

Software at Scale 59 - Incident Management with Nora Jones

2023-07-0544:06

Nora is the CEO and co-founder of Jeli, an incident management platform.Apple Podcasts | Spotify | Google PodcastsNora provides an in-depth look into incident management within the software industry and discusses the incident management platform Jeli.Nora's fascination with risk and its influence on human behavior stems from her early career in hardware and her involvement with a home security company. These experiences revealed the high stakes associated with software failures, uncovering the importance of learning from incidents and fostering a blame-aware culture that prioritizes continuous improvement. In contrast to the traditional blameless approach, which seeks to eliminate blame entirely, a blame-aware culture acknowledges that mistakes happen and focuses on learning from them instead of assigning blame. This approach encourages open discussions about incidents, creating a sense of safety and driving superior long-term outcomes.We also discuss chaos engineering - the practice of deliberately creating turbulent conditions in production to simulate real-world scenarios. This approach allows teams to experiment and acquire the necessary skills to effectively respond to incidents.Nora then introduces Jeli, an incident management platform that places a high priority on the human aspects of incidents. Unlike other platforms that solely concentrate on technology, Jeli aims to bridge the gap between technology and people. By emphasizing coordination, communication, and learning, Jeli helps organizations reduce incident costs and cultivate a healthier incident management culture. We discuss how customer expectations in the software industry have evolved over time, with users becoming increasingly intolerant of low reliability, particularly in critical services (Dan Luu has an incredible blog on the incidence of bugs in day-to-day software). This shift in priorities has compelled organizations to place greater importance on reliability and invest in incident management practices. We conclude by discussing how incident management will further evolve and how leaders can set their organizations up for success. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 58 - Measuring Developer Productivity with Abi Noda

2023-06-1349:29

Abi Noda is the CEO and co-founder of DX, a developer productivity platform.Apple Podcasts | Spotify | Google PodcastsMy view on developer experience and productivity measurement aligns extremely closely with DX’s view. The productivity of a group of engineers cannot be measured by tools alone - there’s too many qualitative factors like cross-functional stakeholder beuracracy or inefficiency, and inherent domain/codebase complexity that cannot be measured by tools. At the same time, there are some metrics, like whether an engineer has committed any code-changes in their first week/month, that serve as useful guardrails for engineering leadership. A combination of tools and metrics may provide the holistic view and insights into the engineering organization’s throughput.In this episode, we discuss the DX platform, and Abi’s recently published research paper on developer experience. We talk about how organizations can use tools and surveys to iterate and improve upon developer experience, and ultimately, engineering throughput.GPT-4 generated summaryIn this episode, Abi Noda and I explore the landscape of engineering metrics and a quantifiable approach towards developer experience. Our discussion goes from the value of developer surveys and system-based metrics to the tangible ways in which DX is innovating the field.We initiate our conversation with a comparison of developer surveys and system-based metrics. Abi explains that while developer surveys offer a qualitative perspective on tool efficacy and user sentiment, system-based metrics present a quantitative analysis of productivity and code quality.The discussion then moves to the real-world applications of these metrics, with Pfizer and eBay as case studies. Pfizer, for example, uses a model where they employ metrics for a detailed understanding of developer needs, subsequently driving strategic decision-making processes. They have used these metrics to identify bottlenecks in their development cycle, and strategically address these pain points. eBay, on the other hand, uses the insights from developer sentiment surveys to design tools that directly enhance developer satisfaction and productivity.Next, our dialogue around survey development centered on the dilemma between standardization and customization. While standardization offers cost efficiency and benchmarking opportunities, customization acknowledges the unique nature of every organization. Abi proposes a blend of both to cater to different aspects of developer sentiment and productivity metrics.The highlight of the conversation was the introduction of DX's innovative data platform. The platform consolidates data across internal and third-party tools in a ready-to-analyze format, giving users the freedom to build their queries, reports, and metrics. The ability to combine survey and system data allows the unearthing of unique insights, marking a distinctive advantage of DX's approach.In this episode, Abi Noda shares enlightening perspectives on engineering metrics and the role they play in shaping the developer experience. We delve into how DX's unique approach to data aggregation and its potential applications can lead organizations toward more data-driven and effective decision-making processes. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 57 - Scalable Frontends with Robert Cooke

2023-05-1655:42

Robert Cooke is the CTO and co-founder of 3Forge, a real-time data visualization platform.Apple Podcasts | Spotify | Google PodcastsIn this episode, we delve into Wall Street's high-frequency trading evolution and the importance of high-volume trading data observability. We examine traditional software observability tools, such as Datadog, and contrast them with 3Forge’s financial observability platform, AMI.GPT-4 generated summaryIn this episode of the Software at Scale podcast, Robert Cooke, CTO and Co-founder of 3Forge, a comprehensive internal tools platform, shares his journey and insights. He outlines his career trajectory, which includes prominent positions such as the Infrastructure Lead at Bear Stearns and the Head of Infrastructure at Liquidnet, and his work on high-frequency trading systems that employ software and hardware to perform rapid, automated trading decisions based on market data.Cooke elucidates how 3Forge empowers subject matter experts to automate trading decisions by encoding business logic. He underscores the criticality of robust monitoring systems around these automated trading systems, drawing an analogy with nuclear reactors due to the potential catastrophic repercussions of any malfunction.The dialogue then shifts to the impact of significant events like the COVID-19 pandemic on high-frequency trading systems. Cooke postulates that these systems can falter under such conditions, as they are designed to follow developer-encoded instructions and lack the flexibility to adjust to unforeseen macro events. He refers to past instances like the Facebook IPO and Knight Capital's downfall, where automated trading systems were unable to handle atypical market conditions, highlighting the necessity for human intervention in such scenarios.Cooke then delves into how 3Forge designs software for mission-critical scenarios, making an analogy with military strategy. Utilizing the OODA loop concept - Observe, Orient, Decide, and Act, they can swiftly respond to situations like outages. He argues that traditional observability tools only address the first step, whereas their solution facilitates quick orientation and decision-making, substantially reducing reaction time.He cites a scenario involving a sudden surge in Facebook orders where their tool allows operators to detect the problem in real time, comprehend the context, decide on the response, and promptly act on it. He extends this example to situations like government incidents or emergencies where an expedited response is paramount.Additionally, Cooke emphasizes the significance of low latency UI updates in their tool. He explains that their software uses an online programming approach, reacting to changes in real-time and only updating the altered components. As data size increases and reaction time becomes more critical, this feature becomes increasingly important.Cooke concludes this segment by discussing the evolution of their clients' use cases, from initially needing static data overviews to progressively demanding real-time information and interactive workflows. He gives the example of users being able to comment on a chart and that comment being immediately visible to others, akin to the real-time collaboration features in tools like Google Docs.In the subsequent segment, Cooke shares his perspective on choosing the right technology to drive business decisions. He stresses the importance of understanding the history and trends of technology, having experienced several shifts in the tech industry since his early software writing days in the 1980s. He projects that while computer speeds might plateau, parallel computing will proliferate, leading to CPUs with more cores. He also predicts continued growth in memory, both in terms of RAM and disk space.He further elucidates his preference for web-based applications due to their security and absence of installation requirements. He underscores the necessity of minimizing the data in the web browser and shares how they have built every component from scratch to achieve this. Their components are designed to handle as much data as possible, constantly pulling in data based on user interaction.He also emphasizes the importance of constructing a high-performing component library that integrates seamlessly with different components, providing a consistent user experience. He asserts that developers often face confusion when required to amalgamate different components since these components tend to behave differently. He envisions a future where software development involves no JavaScript or HTML, a concept that he acknowledges may be unsettling to some developers.Using the example of a dropdown menu, Cooke explains how a component initially designed for a small amount of data might eventually need to handle much larger data sets. He emphasizes the need to design components to handle the maximum possible data from the outset to avoid such issues.The conversation then pivots to the concept of over-engineering. Cooke argues that building a robust and universal solution from the start is not over-engineering but an efficient approach. He notes the significant overlap in applications use cases, making it advantageous to create a component that can cater to a wide variety of needs.In response to the host's query about selling software to Wall Street, Cooke advocates targeting the most demanding customers first. He believes that if a product can satisfy such customers, it's easier to sell to others. They argue that it's challenging to start with a simple product and then scale it up for more complex use cases, but it's feasible to start with a complex product and tailor it for simpler use cases.Cooke further describes their process of creating a software product. Their strategy was to focus on core components, striving to make them as efficient and effective as possible. This involved investing years on foundational elements like string libraries and data marshalling. After establishing a robust foundation, they could then layer on additional features and enhancements. This approach allowed them to produce a mature and capable product eventually.They also underscore the inevitability of users pushing software to its limits, regardless of its optimization. Thus, they argue for creating software that is as fast as possible right from the start. They refer to an interview with Steve Jobs, who argued that the best developers can create software that's substantially faster than others. Cooke's team continually seeks ways to refine and improve the efficiency of their platform.Next, the discussion shifts to team composition and the necessary attributes for software engineers. Cooke emphasizes the importance of a strong work ethic and a passion for crafting good software. He explains how his ambition to become the best software developer from a young age has shaped his company's culture, fostering a virtuous cycle of hard work and dedication among his team.The host then emphasizes the importance of engineers working on high-quality products, suggesting that problems and bugs can sap energy and demotivate a team. Cooke concurs, comparing the experience of working on high-quality software to working on an F1 race car, and how the pursuit of refinement and optimization is a dream for engineers.The conversation then turns to the importance of having a team with diverse thought processes and skillsets. Cooke recounts how the introduction of different disciplines and perspectives in 2019 profoundly transformed his company.The dialogue then transitions to the state of software solutions before the introduction of their high-quality software, touching upon the compartmentalized nature of systems in large corporations and the problems that arise from it. Cooke explains how their solution offers a more comprehensive and holistic overview that cuts across different risk categories.Finally, in response to the host's question about open-source systems, Cooke expresses reservations about the use of open-source software in a corporate setting. However, he acknowledges the extensive overlap and redundancy among the many new systems being developed. Although he does not identify any specific groundbreaking technology, he believes the rapid proliferation of similar technologies might lead to considerable technical debt in the future.Host Utsav wraps up the conversation by asking Cooke about his expectations and concerns for the future of technology and the industry. Cooke voices his concern about the continually growing number of different systems and technologies that companies are adopting, which makes integrating and orchestrating all these components a challenge. He advises companies to exercise caution when adopting multiple technologies simultaneously.However, Cooke also expresses enthusiasm about the future of 3Forge, a platform he has devoted a decade of his life to developing. He expresses confidence in the unique approach and discipline employed in building the platform. Cooke is optimistic about the company's growth and marketing efforts and their focus on fostering a developer community. He believes that the platform will thrive as developers share their experiences, and the product gains momentum.Utsav acknowledges the excitement and potential challenges that lie ahead, especially in managing community-driven systems. They conclude the conversation by inviting Cooke to return for another discussion in the future to review the progression and evolution of the topic. Both express their appreciation for the fruitful discussion before ending the podcast. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 56 - SaaS cost with Roi Rav-Hon

2023-04-1728:29

Roi Rav-Hon is the co-founder and CEO of Finout, a SaaS cost management platform.Apple Podcasts | Spotify | Google PodcastsIn this episode, we review the challenge of maintaining reasonable SaaS costs for tech companies. Usage-based pricing models of infrastructure costs lead to a gradual ramp-up of costs and always have sneakily come up as a priority in my career as an infrastructure/platform engineer. So I’m particularly interested in how engineering teams can better understand, track, and “shift left” infrastructure cost tracking and prevent regressions.We specifically go over Kubernetes cost management, and why cost management needs to be attributable to the most specific teams in order to be self-governing in an organization. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 55 - Troubleshooting and Operating K8s with Ben Ofiri

2023-03-1544:11

Ben Ofiri is the CEO and Co-Founder of Komodor, a Kubernetes troubleshooting platform. Apple Podcasts | Spotify | Google PodcastsWe had an episode with the other founder of Komodor, Itiel, in 2021, and I thought it would be fun to revisit the topic.Highlights (ChatGPT Generated)[0:00] Introduction to the Software At Scale podcast and the guest speaker, Ben Ofiri, CEO and co-founder of Komodor.- Discussion of why Ben decided to work on a Kubernetes platform and the potential impact of Kubernetes becoming the standard for managing microservices.- Reasons why companies are interested in adopting Kubernetes, including the ability to scale quickly and cost-effectively, and the enterprise-ready features it offers.- The different ways companies migrate to Kubernetes, either starting from a small team and gradually increasing usage, or a strategic decision from the top down.- The flexibility of Kubernetes is its strength, but it also comes with complexity that can lead to increased time spent on alerts and managing incidents.- The learning curve for developers to be able to efficiently troubleshoot and operate Kubernetes can be steep and is a concern for many organizations.[8:17] Tools for Managing Kubernetes.- The challenges that arise when trying to operate and manage Kubernetes.- DevOps and SRE teams become the bottleneck due to their expertise in managing Kubernetes, leading to frustration for other teams.- A report by the cloud native observability organization found that one out of five developers felt frustrated enough to want to quit their job due to friction between different teams.- Ben's idea for Komodor was to take the knowledge and expertise of the DevOps and SRE teams and democratize it to the entire organization.- The platform simplifies the operation, management, and troubleshooting aspects of Kubernetes for every engineer in the company, from junior developers to the head of engineering.- One of the most frustrating issues for customers is identifying which teams should care about which issues in Kubernetes, which Komodor helps solve with automated checks and reports that indicate whether the problem is an infrastructure or application issue, among other things.- Komodor provides suggestions for actions to take but leaves the decision-making and responsibility for taking the action to the users.- The platform allows users to track how many times they take an action and how useful it is, allowing for optimization over time.[8:17] Tools for Managing Kubernetes.[12:03] The Challenge of Balancing Standardization and Flexibility.- Kubernetes provides a lot of flexibility, but this can lead to fragmented infrastructure and inconsistent usage patterns.- Komodor aims to strike a balance between standardization and flexibility, allowing for best practices and guidelines to be established while still allowing for customization and unique needs.[16:14] Using Data to Improve Kubernetes Management.- The platform tracks user actions and the effectiveness of those actions to make suggestions and fine-tune recommendations over time.- The goal is to build a machine that knows what actions to take for almost all scenarios in Kubernetes, providing maximum benefit to customers.[20:40] Why Kubernetes Doesn't Include All Management Functionality.- Kubernetes is an open-source project with many different directions it can go in terms of adding functionality.- Reliability, observability, and operational functionality are typically provided by vendors or cloud providers and not organically from the Kubernetes community.- Different players in the ecosystem contribute different pieces to create a comprehensive experience for the end user.[25:05] Keeping Up with Kubernetes Development and Adoption.- How Komodor keeps up with Kubernetes development and adoption.- The team is data-driven and closely tracks user feedback and needs, as well as new developments and changes in the ecosystem.- The use and adoption of custom resources is a constantly evolving and rapidly changing area, requiring quick research and translation into product specs.- The company hires deeply technical people, including those with backgrounds in DevOps and SRE, to ensure a deep understanding of the complex problem they are trying to solve.[32:12] The Effects of the Economy on Komodor.- The effects of the economy pivot on Komodor.- Companiesmust be more cost-efficient, leading to increased interest in Kubernetes and tools like Komodor.- The pandemic has also highlighted the need for remote work and cloud-based infrastructure, further fueling demand.- Komodor has seen growth as a result of these factors and believes it is well-positioned for continued success.[36:17] The Future of Kubernetes and Komodor.- Kubernetes will continue to evolve and be adopted more widely by organizations of all sizes and industries.- The team is excited about the potential of rule engines and other tools to improve management and automation within Kubernetes. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 54 - Community Trust with Vikas Agarwal

2023-02-0140:48

Vikas Agarwal is an engineering leader with over twenty years of experience leading engineering teams. We focused this episode on his experience as the Head of Community Trust at Amazon and dealing with the various challenges of fake reviews on Amazon products.Apple Podcasts | Spotify | Google PodcastsHighlights (GPT-3 generated)[0:00:17] Vikas Agarwal's origin story.[0:00:52] How Vikas learned to code.[0:03:24] Vikas's first job out of college.[0:04:30] Vikas' experience with the review business and community trust.[0:06:10] Mission of the community trust team.[0:07:14] How to start off with a problem.[0:09:30] Different flavors of review abuse.[0:10:15] The program for gift cards and fake reviews.[0:12:10] Google search and FinTech.[0:14:00] Fraud and ML models.[0:15:51] Other things to consider when it comes to trust.[0:17:42] Ryan Reynolds' funny review on his product.[0:18:10] Reddit-like problems.[0:21:03] Activism filters.[0:23:03] Elon Musk's changing policy.[0:23:59] False positives and appeals process.[0:28:29] Stress levels and question mark emails from Jeff Bezos.[0:30:32] Jeff Bezos' mathematical skills.[0:31:45] Amazon's closed loop auditing process.[0:32:24] Amazon's success and leadership principles.[0:33:35] Operationalizing appeals at scale.[0:35:45] Data science, metrics, and hackathons.[0:37:14] Developer experience and iterating changes.[0:37:52] Advice for tackling a problem of this scale.[0:39:19] Striving for trust and external validation.[0:40:01] Amazon's efforts to combat abuse.[0:40:32] Conclusion. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 53 - Testing Culture with Mike Bland

2022-12-2801:06:52

Mike Bland is a software instigator - he helped drive adoption of automated testing at Google, and the Quality Culture Initiative at Apple.Apple Podcasts | Spotify | Google PodcastsMike’s blog was instrumental towards my decision to pick a job in developer productivity/platform engineering. We talk about the Rainbow of Death - the idea of driving cultural change in large engineering organizations - one of the key challenges of platform engineering teams. And we deep dive into the value and common pushbacks against automated testing. Highlights (GPT-3 generated)[0:00 - 0:29] Welcome[0:29 - 0:38] Explanation of Rainbow of Death [0:38 - 0:52] Story of Testing Grouplet at Google[0:52 - 5:52] Benefits of Writing Blogs and Engineering Culture Change [5:52 - 6:48] Impact of Mike's Blog[6:48 - 7:45] Automated Testing at Scale [7:45 - 8:10] "I'm a Snowflake" Mentality [8:10 - 8:59] Instigator Theory and Crossing the Chasm Model [8:59 - 9:55] Discussion of Dependency Injection and Functional Decomposition[9:55 - 16:19] Discussion of Testing and Testable Code [16:19 - 24:30] Impact of Organizational and Cultural Change on Writing Tests [24:30 - 26:04] Instigator Theory [26:04 - 32:47] Strategies for Leaders to Foster and Support Testing [32:47 - 38:50] Role of Leadership in Promoting Testing [38:50 - 43:29] Philosophical Implications of Testing Practices This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 52 - Building Build Systems with Benjy Weinberger

2022-11-1701:02:57

Benjy Weinberger is the co-founder of Toolchain, a build tool platform. He is one of the creators of the original Pants, an in-house Twitter build system focused on Scala, and was the VP of Infrastructure at Foursquare. Toolchain now focuses on Pants 2, a revamped build system.Apple Podcasts | Spotify | Google PodcastsIn this episode, we go back to the basics, and discuss the technical details of scalable build systems, like Pants, Bazel and Buck. A common challenge with these build systems is that it is extremely hard to migrate to them, and have them interoperate with open source tools that are built differently. Benjy’s team redesigned Pants with an initial hyper-focus on Python to fix these shortcomings, in an attempt to create a third generation of build tools - one that easily interoperates with differently built packages, but still fast and scalable.Machine-generated Transcript[0:00] Hey, welcome to another episode of the Software at Scale podcast. Joining me here today is Benji Weinberger, previously a software engineer at Google and Twitter, VP of Infrastructure at Foursquare, and now the founder and CEO of Toolchain.Thank you for joining us.Thanks for having me. It's great to be here. Yes. Right from the beginning, I saw that you worked at Google in 2002, which is forever ago, like 20 years ago at this point.What was that experience like? What kind of change did you see as you worked there for a few years?[0:37] As you can imagine, it was absolutely fascinating. And I should mention that while I was at Google from 2002, but that was not my first job.I have been a software engineer for over 25 years. And so there were five years before that where I worked at a couple of companies.One was, and I was living in Israel at the time. So my first job out of college was at Check Point, which was a big successful network security company. And then I worked for a small startup.And then I moved to California and started working at Google. And so I had the experience that I think many people had in those days, and many people still do, of the work you're doing is fascinating, but the tools you're given to do it with as a software engineer are not great.This, I'd had five years of experience of sort of struggling with builds being slow, builds being flaky with everything requiring a lot of effort. There was almost a hazing,ritual quality to it. Like, this is what makes you a great software engineer is struggling through the mud and through the quicksand with this like awful substandard tooling. And,We are not users, we are not people for whom products are meant, right?We make products for other people. Then I got to Google.[2:03] And Google, when I joined, it was actually struggling with a very massive, very slow make file that took forever to parse, let alone run.But the difference was that I had not seen anywhere else was that Google paid a lot of attention to this problem and Google devoted a lot of resources to solving it.And Google was the first place I'd worked and I still I think in many ways the gold standard of developers are first class participants in the business and deserve the best products and the best tools and we will if there's nothing out there for them to use, we will build it in house and we will put a lot of energy into that.And so it was for me, specifically as an engineer.[2:53] A big part of watching that growth from in the sort of early to late 2000s was. The growth of engineering process and best practices and the tools to enforce it and the thing i personally am passionate about is building ci but i'm also talking about.Code review tools and all the tooling around source code management and revision control and just everything to do with engineering process.It really was an object lesson and so very, very fascinating and really inspired a big chunk of the rest of my career.I've heard all sorts of things like Python scripts that had to generate make files and finally they move the Python to your first version of Blaze. So it's like, it's a fascinating history.[3:48] Maybe can you tell us one example of something that was like paradigm changing that you saw, like something that created like a magnitude, like order of magnitude difference,in your experience there and maybe your first aha moment on this is how good like developer tools can be?[4:09] Sure. I think I had been used to using make basically up till that point. And Google again was, as you mentioned, using make and really squeezing everything it was possible to squeeze out of that lemon and then some.[4:25] But when the very early versions of what became blaze which was that big internal build system which inspired basil which is the open source variant of that today. Hey one thing that really struck me was the integration with the revision controls system which was and i think still is performance.I imagine many listeners are very familiar with Git. Perforce is very different. I can only partly remember all of the intricacies of it, because it's been so long since I've used it.But one interesting aspect of it was you could do partial checkouts. It really was designed for giant code bases.There was this concept of partial checkouts where you could check out just the bits of the code that you needed. But of course, then the question is, how do you know what those bits are?But of course the build system knows because the build system knows about dependencies. And so there was this integration, this back and forth between the, um.[5:32] Perforce client and the build system that was very creative and very effective.And allowed you to only have locally on your machine, the code that you actually needed to work on the piece of the codebase you're working on,basically the files you cared about and all of their transitive dependencies. And that to me was a very creative solution to a problem that involved some lateral thinking about how,seemingly completely unrelated parts of the tool chain could interact. And that's kind of been that made me realize, oh, there's a lot of creative thought at work here and I love it.[6:17] Yeah, no, I think that makes sense. Like I interned there way back in 2016. And I was just fascinated by, I remember by mistake, I ran like a grep across the code base and it just took forever. And that's when I realized, you know, none of this stuff is local.First of all, like half the source code is not even checked out to my machine.And my poor grep command is trying to check that out. But also how seamlessly it would work most of the times behind the scenes.Did you have any experience or did you start working on developer tools then? Or is that just what inspired you towards thinking about developer tools?I did not work on the developer tools at Google. worked on ads and search and sort of Google products, but I was a big user of the developer tools.Exception which was that I made some contributions to the.[7:21] Protocol buffer compiler which i think many people may be familiar with and that is. You know if i very deep part of the toolchain that is very integrated into everything there and so that gave me.Some experience with what it's like to hack on a tool that's everyone in every engineer is using and it's the sort of very deep part of their workflow.But it wasn't until after google when i went to twitter.[7:56] I noticed that the in my time of google my is there the rest of the industry had not. What's up and suddenly i was sort of stressed ten years into the past and was back to using very slow very clunky flaky.Tools that were not designed for the tasks we were trying to use them for. And so that made me realize, wait a minute, I spent eight years using these great tools.They don't exist outside of these giant companies. I mean, I sort of assumed that maybe, you know, Microsoft and Amazon and some other giants probably have similar internal tools, but there's something out there for everyone else.And so that's when I started hacking on that problem more directly was at Twitter together with John, who is now my co-founder at Toolchain, who was actually ahead of me and ahead ofthe game at Twitter and already begun working on some solutions and I joined him in that.Could you maybe describe some of the problems you ran into? Like were the bills just taking forever or was there something else?[9:09] So there were...[9:13] A big part of the problem was that Twitter at the time, the codebase I was interested in and that John was interested in was using Scala. Scala is a fascinating, very rich language.[9:30] Its compiler is very slow. And we were in a situation where, you know, you'd make some small change to a file and then builds would take just,10 minutes, 20 minutes, 40 minutes. The iteration time on your desktop was incredibly slow.And then CI times, where there was CI in place, were also incredibly slow because of this huge amount of repetitive or near repetitive work. And this is because the build tools,etc. were pretty naive about understanding what work actually needs to be done given a set of changes.There's been a ton of work specifically on SBT since then.[10:22] It has incremental compilation and things like that, but nonetheless, that still doesn't really scale well to large corporate codebases that are what people often refer to as monorepos.If you don't want to fragment your codebase with all of the immense problems that that brings, you end up needing tooling that can handle that situation.Some of the biggest challenges are, how do I do less than recompile the entire codebase every time. How can tooling help me be smart about what is the correct minimal amount of work to do.[11:05] To make compiling and testing as fast as it can be?[11:12] And I should mention that I dabbled in this problem at Twitter with John. It was when I went to Foursquare that I really got into it because Foursquare similarly had this big Scala codebase with a very similar problem of incredibly slow builds.[11:29] The interim

Software at Scale 51 - Usage based Pricing with Puneet Gupta

2022-10-1301:05:05

Puneet Gupta is the co-founder and CEO of Amberflo, a cloud metering and usage based pricing platform.Apple Podcasts | Spotify | Google PodcastsIn this episode, we discuss Puneet’s fascinating background early at AWS as a GM and his early experience at Oracle Cloud. We initially discuss why AWS shipped S3 as its first product before any other services. After, we go over the cultural differences between AWS and Oracle, and how usage based pricing and sales tied into the organization’s culture and efficiency.Our episode covers all the different ways organizations align themselves better when pricing is directly tied to the usage metrics of customers. We discuss how SaaS subscription models are simply reworking of traditional software licenses, how vendors can dispel fears around overages due to dynamic pricing models, and even why Netflix should be a usage-based-priced service :-)We don’t have a show notes, but I thought it would be interesting to link the initial PR newsletter for S3’s launch, to reflect on how our industry has completely changed over the last few years. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 50 - Redefining Labor with Akshay Buddiga

2022-09-0801:15:46

Akshay Buddiga is the co-founder and CTO of Traba, a labor management platform.Apple Podcasts | Spotify | Google PodcastsSorry for the long hiatus in episodes! Today’s episode covers a myriad of interesting topics - from being the star of one of the internet’s first viral videos, to experiencing the hyper-growth at the somewhat controversial Zenefits, scaling out the technology platform at Fanatics, starting a company, picking an accelerator, only permitting in-person work, facilitating career growth of gig workers, and more!Highlights[0:00] - The infamous Spelling Bee incident.[06:30] - Why pivot to Computer Science after an undergraduate focus in biomedical engineering?[09:30] - Going to Stanford for Management Science and getting an education in Computer Science.[13:00] - Zenefits during hyper-growth. Learning from Parker Conrad.[18:30] - Building an e-commerce platform with reasonably high scale (powering all NFL gear) as a first software engineering gig. Dealing with lots of constraints from the beginning - like multi-currency support - and delivering a complete solution over several years.The interesting seasonality - like Game 7 of the NBA finals - and the implications on the software engineers maintaining e-commerce systems. Watching all the super-bowls with coworkers.[26:00] - A large outage, obviously due to DNS routing.[31:00] - Why start a company?[37:30] - Why join OnDeck?[41:00] - Contrary to the current trend, Traba only allows in-person work. Why is that?We go on to talk about the implications of remote work and other decisions in an early startup’s product velocity.[57:00] - On being competitive.[58:30] - Velocity is really about not working on the incorrect stuff.[68:00] - What’s next for Traba? What’s the vision?[72:30] - Building two-sided marketplaces, and the career path for gig workers. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 49 - State Management with James Cowling

2022-06-2353:18

James Cowling is the co-founder of Convex, a state management platform for web developers.Apple Podcasts | Spotify | Google PodcastsWe discuss the state of web development in the industry today, and the various different approaches to make it easier. Contrasting the Hasura and Convex approach as a good way to illustrate some of the ideas. Hasura lets you skip the web-app, and run queries against the database through GraphQL queries. Convex, on the other hand, helps you stop worrying about databases. No setup or scaling concerns. It’s interesting to see how various systems are evolving to help developers with reducing the busywork around more and more layers of the stack, and just focus on delivering business value instead.Convex also excels at the developer experience portion - they provide a deep integration with React, use hooks (just like Apollo GraphQL) and seem to have a fully typed (and therefore auto-completable) SDK. I expect more companies will move “up the stack” to provide deeper integrations with popular tools like React.Episode Reading List* The co-founders of this company led Dropbox’s Magic Pocket project.* Convex → Netlify* Convex vs. Firebase* Prisma This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 48 - API Gateway Management with Josh Twist

2022-06-0949:36

Josh Twist is the co-founder and CEO of Zuplo, a programmable, developer friendly API Gateway Management Platform.Apple Podcasts | Spotify | Google PodcastsWe discuss a new category of developer tools startups - API Gateway Management Platforms. We go over what an API Gateway is, why do companies use gateways, common pain-points in gateway management, building reliable systems that serve billions of requests at scale. But most importantly, we dive into the story of Josh’s UK Developer of the Year 2009 award.Recently, I’ve been working on the Vanta API and was surprised at how poor the performance and developer experience around Amazon’s API Gateway is. It has poor support for rate limiting, and has very high edge latency. So I’m excited for a new crop of companies to provide good solutions in this space.Episode Reading List* Amazon’s API Gateway* Stripe’s API - The first ten years* EnvoyThe Award This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 47 - OpenTelemetry with Ted Young

2022-05-2601:33:41

Ted Young is the Director of Developer Education at Lightstep and a co-founder of the OpenTelemetry project.Apple Podcasts | Spotify | Google PodcastsThis episode dives deep into the history of OpenTelemetry, why we need a new telemetry standard, all the work that goes into building generic telemetry processing infrastructure, and the vision for unified logging, metrics and traces.Episode Reading ListInstead of highlights, I’ve attached links to some of our discussion points.* HTTP Trace Context - new headers to support a standard way to preserve state across HTTP requests.* OpenTelemetry Data Collection* Zipkin* OpenCensus and OpenTracing - the precursor projects to OpenTelemetry This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 46 - Authorization with Or Weis

2022-05-1049:05

Or Weis is the CEO and founder of Permit.io, a Permission as a Service platform. Previously, he founded Rookout, a cloud-debugging tool.Apple Podcasts | Spotify | Google PodcastsMany of us have struggled (or are struggling) with permission management in the various applications we’ve built. The complexity of these systems always tends to increase through business requirements - for example, some content should only be accessed by paid users or users in a certain geography. Certain architectures like filesystems have hierarchical permissions that efficient evaluation, and there’s technical complexity that’s often unique to the specific application.We talk about all the complexity around permission management, and techniques to solve it in this episode. We also explore how Permit tries to solve this as a product and abstract this problem out for everyone.Highlights[0:00] - Why work on access control?[02:00] - Sources of complexity in permission management[08:00] - Which cloud system manages permissions well?[11:00] - Product-izing a solution to this problem[17:00] - What kind of companies approach you for solutions to this problem?[22:00] - Why are there research papers written about permission management?[38:00] - Permission management across the technology stack (inter-service communication)[42:00] - What are you excited about building next? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 45 - Q/A with Jon Skeet

2022-04-2050:17

Jon Skeet is a Staff Developer Platform Engineer at Google, working on Google Cloud Platform client libraries for .NET. He's best known for contributions to Stack Overflow as well as his book, C# in Depth. Additionally he is the primary maintainer of the Noda Time date/time library for .NET. You may also be interested in Jon Skeet Facts.Apple Podcasts | Spotify | Google PodcastsWe discuss the intricacies of timezones, how to attempt to store time correctly, how storing UTC is not a silver bullet, asynchronous help on the internet, the implications of new tools like GitHub Copilot, remote work, Jon’s upcoming book on software diagnostics, and more.Highlights[01:00] - What exactly is a Developer Platform Engineer? [05:00] - Why is date and time management so tricky?[13:00] - How should I store my timestamps? We discuss reservation systems, leap seconds, timezone changes, and more.[21:00] - StackOverflow, software development, and more.[27:00] - Software diagnostics[32:00] - The evolution of StackOverflow[34:00] - Remote work for software developers[41:00] - Github Copilot and the future of software development tools[44:00] - What’s your most controversial programming opinion? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 44 - Building GraphQL with Lee Byron

2022-03-2201:04:33

Lee Byron is the co-creator of GraphQL, a senior engineering manager at Robinhood, and the executive director of the GraphQL foundation.Apple Podcasts | Spotify | Google PodcastsWe discuss the GraphQL origin story, early technical decisions at Facebook, the experience of deploying GraphQL today, and the future of the project.Highlights(some tidbits)[01:00] - The origin story of GraphQL.Initially, the Facebook application was an HTML web-view wrapper. It seemed like the right choice at the time, with the iPhone releasing without an app-store, Steve Jobs calling it an “internet device”, and Android phones coming out soon after, with Chrome, a brand-new browser. But the application had horrendous performance, high crash rates, used up a lot of RAM on devices and animations would lock the phone up. Zuckerberg called the bet Facebook’s biggest mistake. The idea was to rebuild the app from scratch using native technologies. A team built up a prototype for the news feed, but they quickly realized that there weren’t any clean APIs to retrieve data in a palatable format for phones - the relevant APIs all returned HTML. But Facebook had a nice ORM-like library in PHP to access data quickly, and there was a parallel effort to speed up the application by using this library. There was another project to declaratively declare data requirements for this ORM for increased performance and a better developer experience.Another factor was that mobile data networks were pretty slow, and having a chatty REST API for the newsfeed would lead to extremely slow round-trip times and tens of seconds to load the newsfeed. So GraphQL started off as a little library that could make declarative calls to the PHP ORM library from external sources and was originally called SuperGraph. Finally, the last piece was to make this language strongly typed, from the lessons of other RPC frameworks like gRPC and Thrift.[16:00] So there weren’t any data-loaders or any such pieces at the time.GraphQL has generally been agnostic to how the data actually gets loaded, and there are plugins to manage things like quick data loading, authorization, etc. Also, Facebook didn’t need data-loading, since its internal ORM managed de-duplication, so it didn’t need to be built until there was sufficient external feedback.[28:00] - GraphQL for public APIs - what to keep in mind. Query costing, and other differences from REST.[42:00] - GraphQL as an open-source project[58:00] - The evolution of the language, new features that Lee is most excited about, like Client-side nullability.Client-side nullability is an interesting proposal - where clients can explicitly state how important retrieving a certain field is, and on the flip side, allow partial failures for fields that aren’t critical. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 43 - Growth at Loom with Harshyt Goel

2022-03-0143:58

Harshyt Goel is a founding engineer and engineering manager of Platform and Integrations at Loom, a video-messaging tool for workplaces. He’s also an angel investor, so if you’re looking for startup advice, investments, hiring advice, or a software engineering job, please reach out to him on Twitter.Apple Podcasts | Spotify | Google PodcastsWe discuss Loom’s story, from when it had six people and a completely different product, to the unicorn it is today. We focus on driving growth, complicated product launches, and successfully launching the Loom SDK.Highlights[00:30] - How it all began[03:00] - Who is a founding engineer? Coming from Facebook to a 5 person startup[06:00] - Company inflection points.[10:30] - Pricing & packaging iterations.[14:30] - Running growth for a freemium product, and the evolution of growth efforts at Loom[30:00] - Summing up the opportunities unlocked by a growth team[33:00] - Sometimes, reducing user friction isn’t what you want.[34:30] - The Loom SDK, from idea to launch. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 42 - Daniel Stenberg, founder of curl

2022-02-1046:40

Daniel Stenberg is the founder and lead developer of curl and libcurl.Apple Podcasts | Spotify | Google PodcastsThis episode, along with others like this one, reminds me of this XKCD:We dive into all the complexity of transferring data across the internet.Highlights[00:30] - The complexity behind HTTP. What goes on behind the scenes when I make a web request?[11:30] - The organizational work behind internet-wide RFCs, like HTTP/3.[20:00] - Rust in curl. The developer experience, and the overall experience of integrating Hyper.[30:00] - Web socket support in curl[34:00] - Fostering an open-source community.[38:00] - People around the world think Daniel has hacked their system, because of the curl license often included in malicious tools.[41:00] - Does curl have a next big thing? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 41 - Minimal Entrepreneurship with Sahil Lavingia

2022-01-2559:05

Sahil Lavingia is the founder of Gumroad, an e-commerce platform that helps you sell digital services. He also runs SHL Capital, a rolling fund for early-stage startups.Apple Podcasts | Spotify | Google PodcastsSahil’s recent book, Minimal Entrepreneurship, explores a framework for building profitable, sustainable companies. I’ve often explored the trade-off between software engineering and trying to build and launch my own company, so this conversation takes up that theme and explores what it means to be a minimal entrepreneur for a software engineer.Highlights(edited)Utsav: Let’s talk about VCs (referencing your popular blog post “Reflecting on My Failure to Build a Billion-Dollar Company”). Are startups pushed to grow faster and faster due to VC dynamics, or is there something else going on behind the scenes?It’s a combination of things. People who get caught up in this anti-VC mentality are missing larger forces at play because I don't really think it's just VCs who are making all of these things happen. Firstly, there’s definitely a status game being played. When I first moved to the Bay Area, as soon as you mention you’re working on your own, the first question people ask you is how far along your company is, who you raised money with, how many employees you have, and comparing you with other people they know. You can’t really get too upset at that, since that’s the nature of the people coming to a boomtown like San Francisco.The way I think about it, there’s a high failure rate in being able to build a billion-dollar company, so you want to find out reasonably quickly whether you will succeed or not. Secondly, we’re in a very unique industry, where equity is basically the primary source of compensation. 90% of Americans don’t have some sort of equity component in the businesses they work for, but giving equity has a ton of benefits. It’s great to have that alignment, and folks who take an early risk for your company should get rewarded. The downside of equity is that it creates this very strong desire and incentive to make your company as valuable as possible, as quickly as possible. In order to get your equity to be considered valuable to investors, you need to grow quickly, so that investors use these models that project your growth rate to form your valuation.Many people took my blog to say - it’s the VC’s fault, but that’s not true. The VCs let me do what I wanted, they don’t really have that much power. The issue was that in order for employees to see a large outcome, you need the company to have a large exit. As a founder, you’d do pretty well if the company sold for $50 million dollars, but that’s not true for employees, they really need this thing to work, otherwise, the best ones can just go work for the next Stripe. So you have this winner-take-all behavior for employees, and it’s ultimately why I ended up shrinking the company to just me for a while.Utsav: So do you give employees equity in the minimalist entrepreneurship framework?Firstly: avoid hiring anyone else for as long as possible, until you know you have some kind of product-market fit. I think It depends on your liquidity strategy. How are you as a founder about to make money from this business? The way you incentivize your employees should align with that. If you want to sell your company for a hundred million dollars, consider sharing that and giving equity. If you plan to create a cash cow business, consider profit sharing.Utsav: What, if any, is the difference between indie-hacking and minimalist entrepreneurship?They’re pretty similar. Indie hacker seems like a personality, perhaps similar to a Digital Nomad, where the lifestyle seems to be the precedent. I went to MicroConf in Las Vegas, and the attendee’s goals were fairly consistent - to buy a nice house and spend more time with their family. In that case, your goal should be to build the most boring but profitable business possible, for a community you don’t particularly care about because your goals have nothing to do with serving that community, which is totally fine. No value judgments from me. With indie-hacking, it seems more geared around independence. I tried living the digital nomad life - work solo, travel the world, no schedule, but I didn’t actually enjoy it. It wasn’t really satisfying. I like working on a project with many people, and things improve, and I get to learn from others, they learn from me, I like talking to my customers, who I can talk to frequently, and their lives are getting better because of my work. I enjoy that. So I wanted a middle-ground between the “live on a beach” mentality and the blitzscaling, build the next Facebook mentality. I like to think that with things like crowdfunding, this will get more and more feasible.Even though my article went viral and the ideas often resonated, there’s this aspirational aspect to many humans - they want to build something amazing and big. It’s kind of the Steve Jobs “make a dent in the universe” idea, even though he might not have actually said that. To account for that, I think incorporating some of the indiehacker principles in the startup path might actually be the most applicable and accessible solution for people.Utsav: One of the key ideas in the book that, that strikes out to me as someone who's a software engineer is that you can keep trying projects on the side. And eventually, if you're doing things right, if you're talking to customers, you will hit something that people want to buy or to use, right? You're not going to get it right the first time probably. Um, but I think that's a really important idea in this. Could you elaborate on that?There are two kinds of people: one, who builds a lot of stuff but don’t know who for. Another to-do-list app, a meditation app, you name it. So you build it, but then you can’t figure out who’ll use it. The other kind is stuck in analysis paralysis, and can’t really hone in on an idea that they want to commit to. The solution to both these personas is to forget about business and immerse yourself in the communities you care about, and try to help them. Focus on contributing to these communities. These could be slack/discord communities. For me, it was Hacker News, Dribbble, and IndieHackers. There’s a bunch of subreddits for everything.Start being a part of these communities, first by listening, and eventually by contributing. I can guarantee that if you become a useful part of the community, you share ideas, people will come up to you and talk about problems that they’re facing. For example, they’re getting paid by YouTube to produce fitness videos, but have to wait for the end of the month, and they’d really like to get paid instantly. Once a community trusts you, and you solve a problem for a specific set of people, you instantly can validate good ideas and deliver value. And iterating over ideas with this community can give you a good chance of success.Listen to the audio for the full interview! This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Software at Scale 40 - Talent Management with Nikita Gupta

2022-01-0735:35

Nikita Gupta is a Co-Founder & CTO at Symba, a platform that helps manage talent development programs like internships.Internships are one of the most effective ways for hiring at a software company, but there’s a lot of work that goes into managing successful interns. With hiring getting harder across the industry due to increased competition and funding, I thought it would be interesting to dive into understanding how to manage successful internship programs.Highlights0:30 - What is Symba?1:30 - Starting with the hot-takes. So, are college degrees overrated now?5:30 - Why do I need a software platform to manage internships?8:50 - Why do companies generally need to manage 8 - 10 platforms for internships? What have you seen?10:30 - As a software engineer or manager, how do I make my intern successful?13:30 - Cadence of check-ins16:30 - With remote interns, how do you build a successful community?18:50 - How do I measure the success/efficacy of my internship program?21:00 - How do I know that my intern mentors/hosts are doing a good job?25:00 - What are some concrete steps that I can take to increase my intern pool’s diversity? What should I track?27:30 - What are some trends in the intern hiring space?32:00 - Government investments in internship programs33:00 - What’s your advice to the first-time intern mentor/host? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

#box-pro-ellipsis-171720806957552{-webkit-line-clamp:2;}Software at Scale