PurePerformance

282 Episodes

Reverse

SREs must not be your SWAT Teams with Dana Harrison

2024-04-0801:01:21

SREs (Site Reliability Engineers) have varying roles across different organizations: From Codifying your Infrastructure, handling high priority incidents, automating resiliency, ensuring proper observability, defining SLOs or getting rid of alert fatigue. What an SRE team must not be is a SWAT team - or - as Dana Harrison, Staff SRE at Telus puts it: "You don't want to be the fire brigade along the DevOps Infinity Loop"In his years of experience as an SRE Dana also used to run 1 week boot camps for developers to educate them on making apps observable, proper logging, resiliency architecture patterns, defining good SLIs & SLOs. He talked about the 3 things that are the foundation of a good SRE: understand the app, understand the current state and make sure you know when your systems are down before your customers tell you so!If you are interested in seeing Dana and his colleagues from Telus talk about their observability and SRE journey then check out the On-Demand session from Dynatrace Perform 2024: https://www.dynatrace.com/perform/on-demand/perform-2024/?session=simplifying-observability-automations-and-insights-with-dynatrace#sessions

Why GitOps is not Git plus Automation for Ops with Roberth Strand

2024-03-2555:48

Whether its GitOps, DevOps, Platform Engineering, Observability as a Service or other terms. We all have our definitions, but rarely do we have a consensus on what those terms really mean! To get some clarity we invited Roberth Strand, CNCF Ambassador and Azure MVP, who has been passionately advocating for GitOps as it was initially defined and explained by Alexis Richardson, Weaveworks in his blog What is GitOps Really! Tune in and learn about Desired State Management, Continuous Pull vs Pushing from Pipelines, how Progressive Delivery or Auto-Scaling fits into declaring everything in Git, what OpenGItOps is and why this podcast will help you get your GitOps certification (coming soon)As we had a lot to talk we also touched on Platform Engineering and various other topicsHere are all the links we discussed:Alexis GitOps Blog Post: https://medium.com/weaveworks/what-is-gitops-really-e77329f23416OpenGitOps: https://opengitops.dev/Flux Image Reflector: https://fluxcd.io/flux/components/image/CNCF White Paper on Platform Engineering: https://tag-app-delivery.cncf.io/whitepapers/platforms/Platform Engineering Maturity Model: https://tag-app-delivery.cncf.io/whitepapers/platform-eng-maturity-model/Platform Engineering Working Group as part of TAG App Delivery: https://tag-app-delivery.cncf.io/wgs/platforms/

What makes GitOps Enterprise Ready with Christian Hernandez

2024-03-1152:34

Can you explain GitOps in simple terms? How does it fit into Continuous Integration (CI), Continuous Delivery and Continuous Deployment? And what are considerations when rolling out GitOps in an enterprise? To get answers to those questions we sat down with Christian Hernandez, Head of Community at Akuity, who has a fabulous analogy to explain GitOps that I am sure many of us will "borrow" from him. Christian also explains the ecosystem he works in such as ArgoCD, Kargo as well as OpenGitOps which aims to provide open-source standard and best practices to implementing GitOps.We closed the session with some advice around Application Dependency Management, External Secrets Operator and choosing the right Git Repo Structure.Here are some of the links we discussed:OpenGitOps: https://opengitops.dev/ArgoCD: https://argoproj.github.io/cd/Kargo: https://github.com/akuity/kargoArgoCon: https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/co-located-events/argocon/GitOpsCon: https://events.linuxfoundation.org/gitopscon-north-america/

Open Mainframe, Zowe, OpenTelemetry: Modernizing the Mainframe with Jessielaine Punongbayan

2024-02-2644:08

While the mainframe is powering the world's most critical system the words "modern", "open source" or "generative AI" typically don't come to mind. So lets change this!To do that simply tune in to our latest episode where we have Jessielaine (Jelly) Punongbayan, Sr. Technical Support Engineer at Dynatrace, telling us why she is excited about the modern Mainframe and how it brought her from the Philippines via Singapore and Czech Republic to Austria. We learn about all the open-source projects and communities she is involved in such as Open Mainframe or Zowe that make it easy to connect the Mainframe with the modern tooling of today's development environments. Jelly shares her stories about the role of good observability, how it connects the distributed and the mainframe world and how it enables development teams to build more efficient systems. And what about AI? Well - you have to tune in and listen to the end!Here the links discussed in the episodeWriting a COBOL program using VSCode: https://medium.com/modern-mainframe/beginners-guide-cobol-made-easy-introduction-ecf2f611ac76 Using CircleCI to perform automation in Mainframe: https://medium.com/modern-mainframe/beginners-guide-cobol-made-easy-leveraging-open-source-tools-eb4f8dcd7a98 Using OpenTelemetry to capture Mainframe Insights: https://medium.com/@jessielaine.punongbayan/re-imagining-mainframe-insights-through-open-source-tooling-79dd4c937114Dynatrace support for Mainframe: https://www.dynatrace.com/technologies/mainframe-monitoring/

The 201 Milestone Episode on Automation, AI, CoPilot and more with Mark Tomlinson

2024-02-1258:28

201 is the HTTP status code for Resource Created. It is also the number of PurePerformance Episodes (including this one) we have published over the past years. None better to invite than the person who initially inspired us to launch PurePerformance: Mark Tomlinson, Performacologist and Director of Observability at FreedomPayTune in and listen to our thoughts on current state of automation, a recap on IFTTT, whether we believe that AIs such as CoPilot will not only make us more efficient in creating code and scripts but also lead to new ways of automation. We also give a heads-up (or rather a recap) of what Mark will be presenting on at Perform 2024.To learn more about and from Mark follow him on the various social media channels:LinkedIn: https://www.linkedin.com/in/mtomlins/Performacology: https://performacology.com/

Optimizing Cloud Native Power Consumption using Kepler with Marcelo Amaral

2024-01-2947:34

Marcelo Amaral is a Researcher for Cloud System Optimization and Sustainability. With his background in performance engineering where he optimized microservice workloads in containerized environments making the leap towards analyzing and optimizing energy consumption was easy.Tune in to this episode and learn about how Kepler, the CNCF project Marcelo is working on, which provides metrics for workload energy consumption based on power models it was trained on by the community. Marcelo goes into details about how Kepler works and also provides practical advice for any developer to keep energy consumption in mind when making architectural and coding decisions.To learn more about Kepler and the episode today check out:LinkedIn from Marcelo: https://www.linkedin.com/in/mcamaral/CNCF Blogpost on Kepler: https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/Kepler GitHub Repo: https://github.com/sustainable-computing-io/kepler

OpenLLMetry - Observing the Quality of LLMs with Nir Gazit

2024-01-1550:32

Its only been a year since ChatGPT was introduced. Since then we see LLMs (Large Language Models) and Generative AIs being integrated into every days life software applications. Developers have the hard choice to pick the right model for their use case to produce the quality of output their end users demand.Tune in to this session where we have Nir Gazit, CEO and Co-founder of Traceloop, educating us about how to observe and quantify the quality of LLMs. Besides performance and costs engineers need to look into quality attributes such as accuracy, readability or grammatical correctness.Nir introduces us to OpenLLMetry - a set of Open Source extensions built on top of OpenTelemetry providing automated observability into the usage of LLMs for developers to better understand how to optimize the usage of LLMs. His advice to every developer is to start measuring the quality of your LLMs on Day 1 and continuously evaluate as you change your model, the prompt and the way you interact with your LLM stack!If you have more questions about LLM Observability check out the following links:OpenLLMetry GitHub Page: https://github.com/traceloop/openllmetryTraceloop Website: https://www.traceloop.com/OpenLLMetry Documentation: https://traceloop.com/docs/openllmetry

Why Developers have different Observability Requirements with Liran Haimovitch

2024-01-0149:49

After analyzing Distributed Traces over more than 15 years Brian and I thought that everyone in software engineering and operations must be satisfied with all that observability data we have available. But. Maybe Brian and I were wrong because we didn’t fully understand all the use cases - especially those for developers that must fix code in production or need to quickly understand what code from somebody else is really doing without having the luxury to add another log line and redeploy on the fly. To learn more about the observability requirements of developers we invited Liran Haimovitch, CTO at Rookout and now part of Dynatrace, who has spent the last 7 years solving the challenging problems that developers face day and night. Tune in and learn about what non-breaking breakpoints are, how it is possible to "debug in production" without impacting running code and how we can make developers lives easier even though we push so many things "to the left"

Mobile, AI, LLMs, Observability & Resiliency - Key Topics for Banks in Hungary with Adam Gajdi

2023-12-1811:21

I was invited to speak at BankTechShow in Budapest, Hungary where the nations IT leaders in the banking sector presented and discussed the future of banking - both in the cloud as well as what it means for the physical bank branches. I got a chance to sit down with Adam Gajdi, IT Solutions CoE Lead at K&H, who walked me through the process of their recent new mobile banking app launch. Adam highlighted the importance of observability for both business owners as well as developers. Furthermore, Adam enlightened me with the fact that Hungarian banks are mandated to conduct chaos tests to proof that their systems are resilient in case of data center outages. I was obviously also curious about how AI, LLMs and other technologies are adopted in their sector. Tune in to learn more

Recap KubeCon 2023 NA, State of Platform Engineering and more with Andi Grabner

2023-12-0428:21

Besides attending KubeCon 2023 NA Andreas (Andi) Grabner, co-host of PurePerformance but guest today, has also travelled parts of the US to chat with the broader observablity community on topics such as Platform Engineering, Observability, DevOps, Automation & Security.Tune in and get a quick recap of all the topics Andi has picked up on his recent trip

Observability, Cybersecurity, DevOps & SRE - Learning from the Public Sector with Willie Hicks

2023-11-2046:28

Zero-Trust Architectures. Data-Flow Inventory. User Experience First! Those are key initiatives in the public sector to ensure that digital services delivered to citizens around the globe are not only working with a flawless user experience but are also safe from any bad actors trying to disrupt agencies on local, stage and federal sectors.In this episode we invited Willie Hicks, Federal CTO at Dynatrace, to learn more about the state of observability and security with government agencies Willie has been working with over the past decade. In our conversation we explore the differences between commercial and government as it comes to ROI or how they see competition as a driving motivator.To learn more about the public sector tune into the Tech Transformers podcast that Willie is co-hosting with his colleague Carolyn Ford.

Blue turns Green: Sustainable IT is everyone's business with Mario-Leander Reimer

2023-11-0651:04

4% of worldwide CO2 emissions come from IT and like in all other industries we have big potential to not only reduce the carbon footprint but also lower costs.Tune in to our episode where we have Mario-Leander Reimer, CTO at QAware GmbH, talk about his top 3 suggestions for Sustainable IT: Making the right architectural choices, Right-sizing your environments and shutting down environments not needed!Mario is also heavily involved in the CNCF and gives us an overview of projects to look into such as Kepler, kube-green, Karpenter or Carbon Aware Multi-Cluster Schedulers.Here are the links we discussed:Blue turns Green presentation: https://speakerdeck.com/lreimer/blue-turns-green-approaches-and-technologies-for-sustainable-k8s-clusters-number-kcdmunich?slide=5Kepler Project: https://kepler.gl/kube-green: https://kube-green.dev/CNCF TAG Environmental Sustainability: https://github.com/cncf/tag-env-sustainabilitySustainability Week: https://tag-env-sustainability.cncf.io/cloud-native-sustainability-week/

Don't burst in Flames: 20 years of Performance Engineering with Martin Spier

2023-10-2349:43

Martin Spier was one of six engineers to take care of all of Netflix Operations about 10 years ago. Back then performance and observability tools weren't as sophisticated and didn't scale to the needs of Netflix as some do today. FlameScope was one of the Open Source projects that evolved out of that period, visualizing Flame Graphs on a time-scaled heatmap to identify specific performance patterns that caused issues in their complex systems back then.Tune in to this episode and hear more performance and observability stories from Martin, about his early days in Brazil, his time at Expedia and Netflix and about his current role as VP of Engineering at PicPay - one of the hottest fin techs in Brazil.More links we discussed:Performance Summit talk about FlameCommander: https://www.youtube.com/watch?v=L58GrWcrD00CMG Impact talk on Real User Monitoring at Netflix: https://www.cmg.org/2019/04/impact-2019-real-user-performance-monitoring-at-netflix-scale/Learn more about Vector: https://netflixtechblog.com/extending-vector-with-ebpf-to-inspect-host-and-container-performance-5da3af4c584bMartin's GitHub: https://github.com/spiermarConnect with him on LinkedIn: https://www.linkedin.com/in/martinspier/

Inside Africa - Cloud Native Observability Journeys with Kelvin Klein

2023-10-0917:08

Africa is not only the second largest continent in the world - its also top when it comes to adoption of cloud native technologies. I was fortunate to spend a week in South Africa and had the chance to spend a lot of time with Kelvin Klein, Dynatrace Product Manager at Mediro ICT. After two observability events in Johannesburg and Cape Town and several meetings with local tech leaders I got to sit down with Kelvin and learn more about the status of Observablity, Cloud Native and Security in South Africa.

The Future of Ops is Sleep with Amit Chiba from Nedbank

2023-09-2510:59

I was fortunate to travel to South Africa and meet many tech leaders in Johannesburg and Cape Town to talk about Observability, Security, Automation, Platform Engineering, DevOps and FinOps. One of those leaders is Amit Chiba, Multi Product Specialist at Nedbank. I sat down with Amit to discuss his personal journey and his projects at Nedbank, one of the leading financial institutions in South Africa. Tune in and hear from Amit how self-service platform engineering helps them to scale observability, how they tackle cloud costs and why he thinks that the future of IT Ops is more Sleep!

Developer Productivity Engineering: Its' more than buying faster hardware with Trisha Gee

2023-09-1144:29

Do you measure build times? On your shared CI as well as local builds on the developers workstations? Do you measure how much time devs spend in debugging code or trying to understand why tests or builds are all of a sudden failing? Are you treating your pre-production with the same respect as your production environments?Tune in and hear from Trisha Gee, Developer Champion at Gradle, who has helped development teams to reduce wait times, become more productive with their tools (gotta love that IDE of yours) and also understand the impact of their choices to other teams (when log lines wake up people at night). Trisha explains in detail what there is to know about DPE (Developer Productivity Engineering), how it fits into Platform Engineering, why adding more hardware is not always the best solution and why Flaky Tests are a passionate topic for Trisha.Here the links to Trishas social media, her books and everything else we discussed during the podcastLinkedIn: https://www.linkedin.com/in/trishagee/Trishas Website: https://trishagee.com/Trisha's Talk on DPE: https://trishagee.com/presentations/developer-productivity-engineering-whats-in-it-for-me/Trisha's Books: https://trishagee.com/2023/07/31/summer-reading-2023/Dave Farley on Continuous Delivery: https://www.youtube.com/channel/UCCfqyGl3nq_V0bo64CjZh8g

Serverless Observability needs a paradigm shift with Toli Apostolidis

2023-08-2801:00:38

Only a few can claim they have successfully created a Pure-Serverless architecture and only those really understand the challenges of observing real event driven architectures. Apostolis Apostolidis (also known as Toli) is one of those people and its why we invited him back to discuss all the lessons learned from his time as Head of Engineering Practices at cinch. Tune in and learn about the evoluation of Serverless observability and the challenges when observing API Gateways, Queues and Step Functions. Listen to Toli's advice on picking one observability vendor, doing your own custom instrumentation and making yourself familiar with the observability data from your managed service provider.Also go back to our previous episode to hear more from his Engineering Practices for Success and remember that the time to ask about coldstarts is over 🙂 Additional links we discussed today:Previous Podcast with Toli: https://www.spreaker.com/user/pureperformance/unlocking-the-power-of-observability-engOpenTelemetry: https://opentelemetry.io/AWS Step Functions: https://aws.amazon.com/step-functions/Dynatrace Business Flow: https://www.youtube.com/watch?v=W0bSzvQrUzA

Practical Platform Engineering vs the Marketing Hype with Maurico (Salaboy) Salatino

2023-08-1454:06

Codifying Golden Paths that ideally don't need you to build a K8s Operator! This is what Practical Platform Engineering should look like!In our latest episode we learn from Maurico (Salaboy) Salatino who has been contributing to open source for the past 12 years. Tune in and learn from his journey of designing and built platforms. He shares his opinion on the Platform Engineering skillsets, how to design for self-service, how to pick the right tools out of the 160+ CNCF project options and shares some of his favorite tools (including Crossplane, VCluster, Argo, OpenFeature, Keptn ...) that should be part of a modern cloud native platform.Links discussed in this podcast:Salaboy on Twitter: https://twitter.com/salaboySalaboy on LinkedIn: https://www.linkedin.com/in/salaboy/Upcoming Book: https://www.salaboy.com/book/Cloud-Native Snapshots: https://www.salaboy.com/cloud-native-snapshots/Diagrid: https://www.diagrid.io/

Sifting through the Noise of Platform Engineering with Saim Safdar

2023-07-3148:12

Reducing the cognitive load by simplifying computing for every developer in an organization! One of the many definitions of Platform Engineering. But what is Platform Engineering for real? Just a new hype? What problem does it really solve? How does it link with DevOps and SRE? Are there any standards or reference architectures available?To get a new perspective on Platform Engineering we invited Saim Safdar, CNCF Ambassador and member of the CNCF TAG App Delivery Platform Working Group. Tune in and learn about the Platform Maturity Model, how to get involved to shape the field of Platform Engineering, what other people that Saim has interviewed are good to follow and much more ..Here the links we discussed:CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platformsMaturity Model Working Document: https://docs.google.com/document/d/1bP8-LQ-d41eIdQB3IC2YsncDhawpFLggql2JxwtE0XI/editPlatform Working Group: https://tag-app-delivery.cncf.io/about/wg-platforms/Cloud Native Podcast with Alexis Richardson: https://www.youtube.com/watch?v=p6D-NYkVp9EPatterns and Anti-Patterns: https://octopus.com/devops/platform-engineering/patterns-anti-patterns/Saim on LinkedIn: https://www.linkedin.com/in/saim-safder/

Unlocking the Power of Observability: Engineering Practices for Success with Toli Apostolidis

2023-07-1747:24

Are you frustrated with your team's ability to troubleshoot issues in production despite their proficiency in pushing out new builds? The root of this problem may lie in the absence of Observability Driven Development. In our latest episode we are joined by Apostolis Apostolidis (also known as Toli) who - as Head of Engineering Practices at cinch - has spent his past years enabling teams to adopt the easiest path to value. He is passionate about DevOps and has a strong opinion on how to educate engineers on "Consciously Instrumenting Code for good Observability".Tune in learn more about good engineering practices, building internal communities of practice, the benefits of traces over metrics and logs and why we need to start adding observability to our CVs and LinkedIn profiles.Here are all relevant links we discussed in this episodeTolis Website: https://www.toli.io/Tolis LinkedIn Profile: https://www.linkedin.com/in/apostolosapostolidis/Toli on Twitter: https://twitter.com/apostolis09/WTFisSRE Talk on DevOps Meets Service Delivery: https://www.youtube.com/watch?v=nLrx0BCMl0YGOTO talk on EDA in Practice: https://www.youtube.com/watch?v=wM-dTroS0FA

#box-pro-ellipsis-171356489300237{-webkit-line-clamp:2;}PurePerformance