The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinMembership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA------------------------------------------------------------------------------------------------------------------------------------------------------------------About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------In this conversation, Unmesh Joshi discusses the patterns of distributed systems. He emphasizes the importance of understanding the context in which patterns are applied, the need to read code to grasp their implementation, and the common pitfalls that developers face when applying patterns without a clear understanding of the underlying problems. Chapters00:00 Introduction to Distributed Systems and Patterns05:39 Understanding Patterns in Distributed Systems19:23 Bridging Theory and Practice in Distributed Systems28:56 The Role of Developers in Understanding Patterns31:58 Understanding Patterns in Software Development40:58 The Human Aspect of Software Design44:37 Iterative Development and Real-World Applications49:03 The Future of Patterns in Cloud-Native Systems55:07 Common Misunderstandings of Distributed PatternsInteresting quotes:"Patterns capture wisdom of generations.""Reading code is the best way to understand.""Patterns help you see beyond abstractions.""Understanding patterns helps bridge the gap.""Expert generalists can operate across verticals.""There are no simple systems in the cloud era.""Patterns can add complexity if misunderstood.""Patterns are always useful within a context.""Design and development are human activities.""The deconstruction of databases is happening.""Paxos is the most misunderstood pattern."Unmesh Joshi :https://in.linkedin.com/in/unmesh-joshi-9487635Catalog of Patterns: https://martinfowler.com/articles/patterns-of-distributed-systems/I hope you liked the episode, if you did please like, share and subscribe. ------------------------------------------------------------------------------------------------------------------------------------------------------------------Like building real stuff?------------------------------------------------------------------------------------------------------------------------------------------------------------------Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator------------------------------------------------------------------------------------------------------------------------------------------------------------------Link to other playlists. LIKE, SHARE and SUBSCRIBE------------------------------------------------------------------------------------------------------------------------------------------------------------------If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet.Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!#distributedsystems #patterns #softwarearchitecture #consensus #algorithms #coding #patterns #softwaredevelopment #ThoughtWorks #softwareengineering #cloud #computing #software
The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/join Membership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA ------------------------------------------------------------------------------------------------------------------------------------------------------------------ About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------ In this episode of the Geek Narrator podcast, host Kaivalya Apte interviews Marc Brooker, a distinguished engineer at AWS, about Aurora D-SQL. They discuss Marc's journey at AWS, the evolution of Aurora D-SQL, and the customer-centric approach that led to its development. Marc explains the choice of PostgreSQL as the foundation for DSQL, the architecture of the database, and the importance of snapshot isolation and concurrency control. The conversation goes into the technical aspects of DSQL, including the write process and how atomicity is maintained, providing listeners with a comprehensive understanding of this innovative database solution. This conversation also goes deep into the intricacies of database design, focusing on fault tolerance, replication strategies, and the role of Firecracker VMs in enhancing scalability. Marc Brooker discusses the architecture of Aurora D-SQL, emphasizing the importance of transaction management, the challenges of active-active deployments, and the trade-offs involved in database design. The discussion also highlights various use cases for Aurora DSQL, including its suitability for micro-services and serverless architectures, while addressing scenarios where it may not be the best fit. Chapters 00:00 Introduction to Aurora DSQL and Marc Brooker's Journey 03:38 The Evolution of Aurora DSQL at AWS 09:24 Customer-Centric Development and Technological Enablers 12:50 Why PostgreSQL? The Choice Behind DSQL 16:39 High-Level Architecture of DSQL 22:07 Understanding Snapshot Isolation and Concurrency Control 28:45 The Write Process and Atomicity in DSQL 38:50 Designing Fault Tolerance in Databases 47:38 Replication and Transaction Commit Strategies 54:35 Active-Active Deployment and Fault Tolerance 01:00:14 Role of Firecracker VM in Scalability 01:09:27 Use Cases and Trade-offs of Aurora D-SQL Marc's Blog: https://brooker.co.za/blog/ Marc on Aurora DSQL : https://brooker.co.za/blog/2024/12/03/aurora-dsql.html AWS's documentation on Aurora DSQL : https://aws.amazon.com/rds/aurora/dsql/features/ ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Like building real stuff? ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Link to other playlists. LIKE, SHARE and SUBSCRIBE ------------------------------------------------------------------------------------------------------------------------------------------------------------------ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #sql #postgres #databasesystems #aws #awsdevelopers #spanner #google #cockroachdb #yugabytedb #cap #scalability #WAL #DistributedSystems #Cloud #aurora
The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/join Membership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA ------------------------------------------------------------------------------------------------------------------------------------------------------------------ About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hey folks - In this episode we have Jelte with us, who is the main contributor to the pg_duckdb project, which is a postgres extension to add the #duckdb power to our beloved #postgresql. We will try to understand how it works? Why is it needed and what's the future of pg_duckdb? If you love #Postgres or #Duckdb or just understanding #database internals then this episode will give you pretty solid insights into Postgres query processing, Duckdb analytics, Postgres extension ecosystem and so on. Basics: pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications. Chapters: 00:00 Introduction to PG-DuckDB 03:40 Understanding the Integration of DuckDB with Postgres 06:23 Architecture of PG-DuckDB: Query Processing Explained 10:02 Configuring DuckDB for Analytics Queries 15:37 Managing Workloads: Transactional vs. Analytical 21:02 Observability and Debugging in DuckDB 25:58 Data Deletion and GDPR Compliance 30:46 Schema Management and Migration Challenges 33:14 Managing Schema Changes in Databases 35:21 Upgrading Database Extensions 36:33 Enhancing Data Reading Methods 38:33 Future Features and Improvements 45:54 Use Cases for PGDuckDB 50:03 Challenges in Building the Extension 55:25 Getting Involved with PGDuckDB Important links: The duckdb discord server, which has a pg_duckdb channel inside it: https://discord.duckdb.org/ repo: https://github.com/duckdb/pg_duckdb good-first-issue issues: https://github.com/duckdb/pg_duckdb/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Like building real stuff? ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Link to other playlists. LIKE, SHARE and SUBSCRIBE ------------------------------------------------------------------------------------------------------------------------------------------------------------------ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #sql #postgres #databasesystems
The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/join Membership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA ------------------------------------------------------------------------------------------------------------------------------------------------------------------ About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------ In this episode we are talking to Peter and Qian, co-founders of DBOS. The conversation covers the challenges of creating fault-tolerant applications, the architecture of DBOS, and how it addresses reliability at multiple layers. Chapters: 00:00 Introduction to the Geeknerder Podcast 00:29 Meet the Co-Founders of DBOSS 01:25 The Core Problem: Building Reliable Systems 02:05 How DBOSS Solves Reliability Issues 04:29 Understanding DBOSS Architecture 06:09 Deep Dive into DBOSS Library 08:36 Postgres and State Management 18:31 Handling Parallel Steps and Performance Concerns 26:00 Observability and Version Control 30:18 Running Multiple Code Versions 30:58 Managing Workflow Versions 32:03 Surgery on Workflow States 33:15 Library Annotations and Durable Execution 34:24 Migrating to the Cloud Version 37:23 Handling Email Workflows 42:41 Transactional Guarantees with Postgres 48:44 Technical Challenges and Multi-Tenancy 54:12 Real-World Use Cases and Benefits 59:45 Conclusion and Final Thoughts Some important links: - Main website: https://www.dbos.dev/ - DBOS docs: https://docs.dbos.dev/ - Open-source DBOS Transact libraries: - Python: https://github.com/dbos-inc/dbos-transact-py - TypeScript: https://github.com/dbos-inc/dbos-transact-ts ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Like building real stuff? ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Link to other playlists. LIKE, SHARE and SUBSCRIBE ------------------------------------------------------------------------------------------------------------------------------------------------------------------ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
Deep Dive into Databases with Peter Zaitsev | The GeekNarrator Podcast Join host Kaivalya Apte and special guest Peter Zaitsev from Percona on this episode of the Geeknerder Podcast. They discuss Peter's fascinating journey into the world of databases, founding Percona, and the evolution of open source database solutions. Topics include the rise of PostgreSQL, the comparison between MySQL and PostgreSQL, database observability, the impact of cloud and Kubernetes on database management, licensing changes in popular databases like Redis, and career advice for database administrators and developers. Stay tuned for insights on the future of databases, observability strategies, and the role of AI in database management. 00:00 Introduction and Guest Welcome 00:14 Peter's Journey into Databases 04:15 The Rise of PostgreSQL vs MySQL 18:17 Challenges in Managing Database Clusters 24:36 Common Developer Mistakes with Databases 30:59 MongoDB's Success and Future 34:53 Redis and Licensing Changes 37:07 Elastic's License Change and Its Impact 38:25 Redis Fork and Industry Collaboration 40:27 Kubernetes and Cloud-Native Databases 47:47 Challenges in Database Upgrades and Migrations 54:58 Load Testing and Observability 01:09:02 Future of Database Administration and Development 01:15:13 Conclusion and Final Thoughts Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
Join Kaivalya Apte and Simon Hørup Eskildsen from Turbopuffer as they talk about the complexities of building a database on top of object storage. Discover the key challenges, the nuances of various storage formats, and the critical trade-offs involved. Learn from Simon's rich experience, from his time at Shopify to creating Turbopuffer. This episode covers everything—from approaches to write-ahead logs to multi-tenancy and object storage advancements. Perfect for database enthusiasts and those keen on first-principles thinking! 00:00 Introduction 00:17 Simon's Background and Journey to TurboBuffer 02:42 Challenges in Database Scalability 04:21 Experimenting with Vector Databases 05:02 Cost Implications of Vector Databases 05:52 Architectural Considerations for Search Workloads 07:39 Building a Database on Object Storage 16:14 Designing a Simple Database on Object Storage 26:01 Handling Multiple Writers and Consistency 31:26 Trade-offs in Write Operations 32:36 Optimizing MySQL Write Performance 34:03 Batching Writes in Object Storage 35:08 Time-Based vs Size-Based Batching 36:32 Understanding Amplification in Databases 42:26 Challenges with Cold Queries 44:02 Building and Persisting B-Trees 50:53 Separating Workloads in Databases 56:07 Multi-Tenancy Challenges 01:00:39 Choosing Storage Formats 01:06:10 Key Innovations in Object Storage Databases Important links: - https://github.com/sirupsen/napkin-math (numbers) - https://turbopuffer.com/ - https://turbopuffer.com/architecture - https://sirupsen.com/napkin/problem-10-mysql-transactions-per-second - https://sirupsen.com (my blog, napkin math) - https://sirupsen.com/subscribe (napkin math newsletter) - https://github.com/rkyv/rkyv rkyv rust Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
Welcome to The GeekNarrator podcast! In this episode, host Kaivalya Apte goes deeper into the practical applications of formal methods with Jack Vanlightly, a principal technologist at Confluent. With years of experience in distributed systems, Jack discusses his journey and how formal methods have been instrumental in system design verification and bug detection. The conversation covers Jack's background, his process of using formal methods, the significance of modelling, verification, documentation, and systems learning, as well as the future evolution of tooling and its applications. Tune in to understand the intricacies of how formal methods can transform your approach to distributed systems! Chapters: 00:00 Introduction to the episode 00:37 Meet Jack VanLightly: Principal Technologist at Confluent 02:17 Jack's Journey into Distributed Systems 04:29 Discovering the Power of Formal Methods 08:11 Modeling and Simulation in Formal Methods 13:43 Verification and Safety Properties 19:02 Documentation and Communication Challenges 20:43 Formal Methods as a Systems Learning Tool 24:26 Practical Applications and Case Studies 56:38 Future of Formal Verification and Closing Thoughts Jack's Blog: https://jack-vanlightly.com/ Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
Database Internals - NileDB: Postgres Re-engineered for Multitenant Apps with Gwen Shapira Join us in this episode as we dive deep into the intricacies of NileDB, a groundbreaking database re-engineered for multi-tenant applications. Our special guest, Gwen Shapira, co-founder of NileDB and a notable figure in the database field, shares her insights and technical know-how on solving the common challenges faced by multitenant SaaS applications. From the benefits of using Postgres as the underlying database to the unique tenant isolation features of NileDB, we cover it all. Don't miss out on learning about AI native capabilities, handling schema migrations, and ensuring zero downtime data migrations. Chapters: 00:00 Introduction 07:19 Challenges in Multi-Tenant Databases 11:09 Tenant Isolation and NILDB's Approach 34:16 Necessary Modifications for Tenant Data 37:47 Zero Downtime Data Migrations 44:32 Handling Schema Migrations 59:11 AI Use Cases and Vector Embedding Storage 59:51 Technical and Non-Technical Learnings from Building Nile 01:05:03 Future Plans and Upcoming Features NileDB: https://www.thenile.dev/ Blog: https://www.thenile.dev/blog Gwen's Linkedin: https://www.linkedin.com/in/gwenshapira Gwen's Twitter: https://twitter.com/gwenshap #postgres #sql #ai Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
Building a Continuous Profiler with Frederic from Polar Signals | Geek Narrator Podcast In this episode we chat with Frederic from Polar Signals. We dive deep into the intricacies of building a continuous profiler, the challenges faced, and the unique solutions developed by Polar Signals. Frederic shares insights from his background in observability and discusses the innovations in FrostDB, a custom columnar database designed for high-performance query and storage of profiling data. Chapters: 00:00 Introduction 00:29 Frederic's Background 03:40 What is Continuous Profiling? 06:56 Challenges in Data Collection 18:22 Profiling Data Ingestion and Storage Architecture 27:23 Querying Data 28:52 High Cardinality Data and Cost Optimization 23:39 Tenant Isolation and Load Management 41:24 Performance Optimizations 46:02 Testing & Deterministic Simulation 50:33 Technical and Organizational Learnings 54:32 Future of Polar Signals 56:21 Conclusion You can check more about Polar Signals here: https://www.polarsignals.com/ Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #systemdesign
Welcome back to another episode! Today, I have a special guest, Chris Riccomini, joining me to delve into the exciting world of databases. In this episode, we focus on SlateDB, a new and innovative database that's making waves in the tech community. We'll cover a wide range of topics, including the architecture of SlateDB, its internals, design decisions, and some fascinating use cases. Chris, a seasoned software engineer with a background at LinkedIn and WePay, shares his journey and the motivations behind creating SlateDB. 🎙️ Chatpers: 00:00 Introduction to the Topic and Guest 01:58 Chris Riccomini's Background and Experience 04:19 The Genesis of SlateDB 04:54 Understanding SlateDB's Architecture 10:22 The Rise of Object Storage in Databases 13:43 Exploring SlateDB's Features and Trade-offs 32:54 Understanding Latency Trade-offs 34:12 Exploring Storage Formats and Manifest Files 37:25 Caching Strategies and Optimizations in SlateDB 50:21 Consistency Guarantees and Transactionality 52:36 Integration and Resource Management in SlateDB 56:04 Future Prospects and Use Cases for SlateDB SlateDB: https://slatedb.io/ More about Chris: https://cnr.sh/ Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #systemdesign #formalmethods
In this video I talk to Jayaprabhakar Kadarkarai aka JP who is the founder of FizzBee. FizzBee is a design specification language and model checker to help developers verify their design before writing even a single line of implementation code. We have discussed where it is applicable, what are the benefits, how does it work and many other interesting challenges with examples. Chapters: 00:00 Introduction 01:13 Challenges in Designing Distributed Systems 03:13 Understanding Design Specification Languages 04:00 The Value of Structured Design Documents 09:00 When to Use Design Specification Languages 21:27 Modeling a Travel Booking System 22:51 Ensuring Atomicity in Distributed Systems 26:09 Handling Failures and Consistency 34:45 Refinement in System Design 35:38 Balancing Abstraction and Implementation 37:53 Common Pitfalls in Modeling and Implementation 40:02 Challenges in System Design and Implementation 40:12 Two-Way Feedback in System Design 41:01 Performance Considerations in Implementation 41:36 Importance of Solid Design Blueprints 41:56 Model-Based Testing and Continuous Integration 43:27 Updating Design Documentation 44:38 Simulation Testing vs. Model Checking 45:32 Design Issues and Formal Verification 49:51 Applying Formal Verification to Existing Systems 55:35 Common Design Problems and Solutions 01:07:57 Future Enhancements in Design Specification Tools 01:12:50 Getting Started with FizzBee FizzBee : https://fizzbee.io/ Get in touch with JP: https://www.linkedin.com/in/jayaprabhakar Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #systemdesign #formalmethods
In this episode of The Geek Narrator podcast, hosted by Kaivalya Apte, we welcome a special guest, Kishore Gopalakrishna from StarTree, co-author of Apache Pinot and other notable projects. Kishore shares his extensive experience in building real-time analytics and streaming systems, including Apache Pino, Espresso, Apache Helix, and Third Eye. The episode delves into the motivations and challenges behind creating these systems, the innovations they brought to distributed systems, and the impact of community on open-source projects. Kishore also discusses the evolution of testing methodologies, cost optimizations in transactional and analytical systems, and key considerations for companies evaluating real-time analytics solutions. Don't miss this in-depth conversation packed with valuable insights for both seasoned developers and tech enthusiasts! Chapters: 00:00 Introduction 03:13 Building Distributed Systems at LinkedIn 08:57 Testing and Challenges in Distributed Systems 30:50 Advantages of Columnar Storage 33:04 The Importance of Upserts 34:24 Building a Strong Open Source Community 41:10 Challenges and Lessons in System Design 51:35 Real-Time Analytics: Do You Need It? StarTree: https://startree.ai/ Apache Pinot: https://pinot.apache.org/ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #kafka #s3 #streaming #realtimeanalytics #database #pinot #startree
In this episode of The GeekNarrator podcast, host Kaivalya Apte interviews Ryan and Richie, the founders of WarpStream. They discuss the architecture, benefits, and core functionalities of WarpStream, a drop-in replacement for Apache Kafka. The conversation covers their experience with Kafka, the design decisions behind WarpStream, and the operational challenges it addresses. They also delve into the seamless migration process, the scalability, and cost benefits, the integration with the Kafka ecosystem, and potential future features. This episode is a must-watch for developers and tech enthusiasts interested in modern, distributed data streaming solutions. Chapters: 00:00 Introduction 02:27 Introducing Warpstream: A Kafka Replacement 11:07 Deep Dive into Warpstream's Architecture 35:42 Exploring Kafka's Ordering Guarantees 36:52 Handling Buffering and Compaction 38:44 Efficient Data Reading and File Caching 44:06 WarpStream's Flexibility and Cost Efficiency 01:06:59 Future Features Links: WarpStream : https://www.warpstream.com/ Blog: https://www.warpstream.com/blog X: Ryan: https://x.com/ryanworl Richard Artoul: https://x.com/richardartoul Kaivalya Apte: https://x.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #kafka #s3 #streaming
Exploring XTDB with Jeremy Taylor & Malcolm Sparks: An In-Depth Dive into Immutability and Database Internals In this episode of the Geek Narrator Podcast, host Kaivalya is joined by Jeremy Taylor and Malcolm Sparks from Juxt to explore XTDB, an immutable database designed to handle complex historical and financial data with precision. They delve into the architecture, internal mechanics, and use cases while discussing the importance of immutability. This episode covers everything you need to know about XTDB and its capabilities. Whether you're a developer interested in databases or someone curious about data management and history tracking, this discussion offers invaluable insights. Chapters: 00:00 Introduction 02:51 Challenges with General Purpose Databases 11:50 XTDB: A New Approach to Databases 31:56 Understanding Kafka and XTDB Integration 36:06 Querying and Indexing in XTDB 40:31 Temporal Data Management and Use Cases 54:52 Deployment and User Experience XTDB: https://xtdb.com/ XTDB Github: https://github.com/xtdb/xtdb Juxt: https://www.juxt.pro/ Juxt Github: https://github.com/juxt If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #sql #kafka #datastorage #immutable
In this episode of The GeekNarrator podcast, host Kaivalya Apte dives into the complexities of testing distributed systems with Will Wilson from Antithesis. If you’re grappling with the challenges of testing databases, micro-services, and distributed systems, this episode is a must-watch. Will Wilson demystifies the concept of deterministic simulation testing, shares insights about its advantages over conventional testing methods, and explains how Antithesis helps developers ensure software reliability. Learn about the various strategies and techniques used to identify and resolve bugs, and explore how deterministic simulation can transform your software testing approach. Perfect for developers, engineers, and tech enthusiasts who are keen on improving their testing methodologies for complex systems. Chapters: 00:00 Introduction 03:04 Limitations of Conventional Testing Methods 04:09 Understanding Deterministic Simulation Testing 08:07 Implementing Deterministic Simulation Testing 14:30 Real-World Example: Chat Application 19:56 Antithesis Hypervisor and Determinism 27:06 Defining Properties and Assertions 38:34 Optimizing Snapshot Efficiency 40:44 Understanding Isolation in CI/CD Pipelines 43:39 Strategies for Effective Bug Detection 47:59 Exploring Program State Trees 51:17 Heuristics and Fuzzing Techniques 01:01:56 Mocking Third-Party APIs 01:05:54 Handling Long-Running Tests 01:09:06 Classifying and Prioritizing Bugs 01:15:35 Future Plans and Closing Remarks References: Hypervisor: https://antithesis.com/blog/deterministic_hypervisor/ AFL : https://github.com/google/AFL Antithesis website: https://antithesis.com/ Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #databases #microservices #antithesis #fuzzer #testing
Exploring Turso with Glauber Costa: Insights on SQLite for Production In this episode of The GeekNarrator podcast, host Kaivalya Apte interviews Glauber Costa, founder and CEO of TursoDB. They discuss the inception of TursoDB, Glauber's background in Linux kernel development, and the journey from unikernel projects to founding a database company. Glauber explains TursoDB's enhancements to SQLite for production use, including native replication, schema management, and vector search capabilities. The conversation dives deep into use cases, architecture, and the benefits of a multi-tenant database design. Learn about TursoDB’s future plans and essential insights for developers. Chapters: 00:00 Introduction 05:05 The Birth of Turso 08:02 Challenges and Pivot to libSQL 17:12 SQLite for Production: Enhancements and Features 22:02 Replication and Backup Solutions 23:38 Enterprise-Level Features and Multi-Tenancy 25:55 User Experience and Simplicity of TursoDB 33:14 Handling Network Failures and Monitoring 36:35 Native Replication in SQLite 37:52 Virtualizing the Write-Ahead Log 39:20 Replication Mechanisms 41:31 Primary and Replica Dynamics 46:51 Multi-Tenancy and Scalability 53:33 Schema Changes and Migrations 58:51 Vector Search Capabilities 01:02:13 Future Roadmap and Features If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #databases #sqlite #sql
Deep Dive into Serverless Databases with Neon: Featuring Heikki Linnakangas In this episode of the Geek Narrator podcast, host Kaivalya Apte is joined by Heikki Linnakangas, co-founder of Neon, to explore the innovative world of serverless databases. They discuss Neon's unique approach to separating compute and storage, the benefits of serverless architecture for modern applications, and dive into various compelling use cases. They also cover Neon's architectural features like branching, auto-scaling, and auto-suspend, making it a powerful tool for both developers and enterprises. Whether you're curious about multi-tenancy, fault tolerance, or developer productivity, this episode offers insightful knowledge about leveraging Neon's capabilities for your next project. 00:00 Introduction 00:53 The Birth of Neon: Why It Was Created 02:16 Understanding Serverless Databases 07:06 Neon's Architecture: Separation of Compute and Storage 09:59 Exploring Branching in Neon 18:21 Auto Scaling and Handling Spikes in Traffic 20:17 The Challenge of Multiple Writers in Distributed Systems 22:51 Auto Suspend: Cost-Effective Database Management 26:02 Optimizing Cold Start Times 27:14 Balancing Cost and Performance 28:52 Replication and Durability 30:32 Understanding the Storage Layer 34:02 Custom LSM Tree Implementation 36:21 Fault Tolerance and Failover 07:00 Developer Productivity and Use Cases 42:56 Migration and Tooling 48:35 Future Roadmap and User Experience 50:28 Conclusion and Final Thoughts Neon website: https://neon.tech/ Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #PostgreSQL #SQL #RDBMS #NEON
In this video I speak with Felix GV, who is a Principal Staff Engineer at Linkedin, and has done major contributions to the data infrastructure and Linkedin, including VeniceDB. This episode will give you a good understanding of why we need a new database for storing "Derived Data" in a low latency, high performance manner, which is very important for Machine Learning workloads. Chapters: 00:00 Introduction 01:42 The Evolution of LinkedIn's Databases 03:15 Challenges with Voldemort and the Birth of VeniceDB 08:42 Understanding Derived Data 13:33 Planet-Scale Applications and Multi-Region Support 17:40 Writing Data into VeniceDB 22:53 Merging Data in VeniceDB 40:31 Understanding the Architecture 40:47 Components of the Write Path 41:56 Leader and Follower Architecture 43:58 Partitioning and DaVinci Client 47:57 Read Patterns and Client Options 54:25 Fault Tolerance and Recommender Systems 01:01:19 Kafka Integration and Deployment 01:06:56 Roadmap and Future Improvements Important links: VeniceDB blog: https://www.linkedin.com/blog/engineering/open-source/open-sourcing-venice-linkedin-s-derived-data-platform VeniceDB docs: https://venicedb.org/ Qcon: https://youtu.be/pJeg4V3JgYo?si=vblGUxp5fNdKPHoC Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #kafka #linkedin #venicedb #Rocksdb
In this video I speak with Philippe Noël, about ParadeDB, which is an Elasticsearch alternative built on Postgres, modernizing the features of Elasticsearch's product suite, starting with real-time search and analytics. I hope you will enjoy and learn about the product. Chapters: 00:00 Introduction 01:12 Challenges with Elasticsearch and the Need for ParadeDB 02:29 Why Postgres? 06:30 Technical Details of ParadeDB's Search Functionality 18:25 Analytics Capabilities of ParadeDB 24:00 Understanding ParadeDB Queries and Transactions 24:22 Application Logic and Data Workflows 25:14 Using PG Cron for Data Migration 30:05 Scaling Reads and Writes in Postgres 31:53 High Availability and Distributed Systems 34:31 Isolation of Workloads 39:38 Database Upgrades and Migrations 41:21 Using ParadeDB Extensions and Distributions 43:02 Observability and Monitoring 44:42 Upcoming Features and Roadmap 46:34 Final Thoughts Important links: Links: GitHub: https://github.com/paradedb/paradedb Website: https://paradedb.com Docs: https://docs.paradedb.com/ Blog: https://blog.paradedb.com Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #postgresql #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign #elasticsearch
In this video I speak with Andrew Lamb, Staff Software Engineer @Influxdb. We discuss FDAP (Flight, DataFusion, Arrow, Parquet) stack for modern OLAP database system design. Andrew shared some insights into why the FDAP stack is so powerful in designing and implementing a modern OLAP database. Chapters: 00:00 Introduction 01:48 Understanding Analytics: Transactional vs Analytical Databases 04:41 The Genesis and Goals of the FDAP Stack 09:31 Decoding FDAP: Flight, Data Fusion, Arrow, and Parquet 12:40 Apache Parquet: Revolutionizing Columnar Storage 17:18 Apache Arrow: The In-Memory Game Changer 23:51 Interoperability and Migration with Apache Arrow 27:10 Comparing Apache Parquet and Arrow 28:26 Exploring Data Mutability in Analytic Systems 29:19 Handling Data Updates and Deletions 29:24 The Role of Immutable Storage in Analytics 30:42 Optimizing Data Storage and Mutation Strategies 34:20 Introducing Flight: Simplifying Data Transfer 35:02 Deep Dive into Flight's Benefits and SQL Support 39:20 Unpacking Data Fusion's SQL Support and Extensibility 46:12 The Interplay of FDAP Components in Analytics 51:49 Future Directions and Innovations in Data Analytics 56:04 Concluding Thoughts on FDAP and Its Impact FDAP Stack: https://www.influxdata.com/glossary/fdap-stack/ FDAP Blog: https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/ InfluxDB: https://www.influxdata.com/ Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign