On this episode, we sit down with Robert Hodges, a key figure in the ClickHouse community and a driving force behind its rise in the U.S., who joins us as our inaugural live guest. His collaboration with Alexander Zaitsev of Altinity has significantly boosted the visibility and adoption of ClickHouse since its open-source release by Yandex in 2016. From their early challenges with Kubernetes to the robust systems we see today, Robert shares invaluable insights into the transformative power of ClickHouse, which empowers businesses with real-time data capabilities for applications like recommendation engines and high-frequency trading strategies. Our conversation with Robert uncovers ClickHouse's unique architecture, highlighting its potential and the importance of understanding its columnar data organization. We dive into the practicalities and operational experiences necessary to optimize data handling efficiently, avoiding the pitfalls of costly changes and scalability issues. Robert emphasizes the critical choice between ZooKeeper and Keeper for system management, advocating for Keeper's modern advantages, especially as teams look to innovate beyond routine operations. As we explore the future, Robert shares his vision for data lakes and next-generation databases, touching on Altinity's pioneering work with serverless clusters and Kubernetes. Focusing on making data lakes more real-time and accessible, we delve into the innovations transforming data analytics. Robert highlights the community's vital role in these advancements and encourages engagement through Altinity's blog and Slack channel for those eager to participate in this dynamic evolution. Enjoy the conversations! Propel Data Altinity
Unlock the secrets of ClickHouse's analytical prowess with our latest episode, where we promise you'll gain unparalleled insights into funnel analysis. Picture yourself boosting conversion rates at your online taco shop by mastering the Window Funnel function—track crucial user events like menu views, cart additions, and purchases all within a specific time frame. Discover how to identify drop-off points in the funnel and address bottlenecks head-on, ensuring a smoother journey for your customers. Our engaging example of an online taco shop brings these concepts to life, showing you how to transform user journey data into actionable strategies. But that's not all—ClickHouse offers a treasure trove of features beyond funnel analysis. We dive into the versatility of the Sequence Match function, a powerful tool for mapping out non-linear user journeys using pattern recognition. Explore how ClickHouse's array and window functions, as well as its machine learning capabilities, can push the boundaries of your data insights. As hosts, we're here to inspire you to maximize these potent tools, empowering ClickHouse developers to enhance user engagement and craft extraordinary user experiences. Join us as we unlock the full potential of ClickHouse, guiding you through a roadmap to smarter, more effective user journey analysis. This episode is sponsored by Propel—the easiest way to run ClickHouse, the world's fastest real-time analytics database, at any scale.
Unlock the secrets to turbocharging your ClickHouse performance with our latest episode, where we promise to transform your taco order queries into a slick, high-speed operation. Through engaging examples and witty analogies, we take you on a journey to master the art of primary keys not as guardians of uniqueness but as champions of query speed. Discover how arranging columns from low to high cardinality can revolutionize your data retrieval process, and learn practical strategies for managing hierarchical and time series data to boost efficiency and compression. As we navigate the intricacies of data management in ClickHouse, you'll gain insights into the unique operations that set it apart from traditional databases. Understand how data partitioning and data skipping indexes can supercharge your querying performance. Embrace the spirit of experimentation as we encourage you to explore and optimize, leaving you with a newfound confidence to fully leverage ClickHouse's capabilities. Whether you're a seasoned pro or a curious newcomer, this episode is packed with actionable tips to elevate your data game. This episode is sponsored by Propel. Propel is a Serverless ClickHouse platform with APIs andEmbeddable UIs for developers to ship data apps in record time.
This episode of the ClickHouse Podcast discusses how to flatten DynamoDB JSON using ClickHouse to set up a scalable, analyzable pipeline. DynamoDB is a key-value and document database, while ClickHouse is a columnar store. This transformation makes it significantly easier to query the data in ClickHouse. The process involves creating multiple Materialized Views that help unpack the nested JSON structure and partition the data by table or entity. The episode walks through the steps to flatten DynamoDB JSON using ClickHouse, including: Getting your DynamoDB events to ClickHouse Creating a Materialized View for flattening Flattening individual tables Handling single-table design This approach will ensure you have the flexibility to query your data efficiently and make the most of your event-driven data architecture. Want to dig deeper? https://www.propeldata.com/blog/flattening-dynamodb-json-in-clickhouse
In this episode of the ClickHouse Podcast, the hosts explore the ReplacingMergeTree table engine in ClickHouse. ReplacingMergeTree is designed to handle mutable data, replacing rows with the same primary key instead of appending new ones. It merges rows based on a defined sorting key, keeping only the latest version and removing outdated ones. This engine is useful for cases like real-time updates, deduplication, and slowly changing dimensions. The hosts emphasize the importance of carefully defining the sorting key using the ORDER BY clause to optimize both query performance and data uniqueness. While ReplacingMergeTree offers powerful features for managing mutable data, considerations include merge timing, storage impact, and row count inflation before merges occur. For querying, the FINAL modifier ensures the latest version is retrieved but can impact performance. The episode concludes with best practices for using ReplacingMergeTree efficiently and hints at its potential for real-time data synchronization from OLTP systems like MySQL or PostgreSQL. Looking for more information on the ReplacingMergeTree? https://www.propeldata.com/blog/understanding-replacingmergetree-in-clickhouse
In this podcast episode, we dive into the latest release of ClickHouse and dissect the changelog. ClickHouse version 24.9 introduces significant improvements to security, usability, and performance. New features include multiple authentication methods per user, overlay string functions, and partition-level lightweight deletes. The update includes performance enhancements for JOIN operations, filesystem cache optimization, and parallel merge for the uniq aggregate function. It is essential to be aware of potential backward-incompatible changes, particularly around tuple expressions and replicated databases. Experimental features include dynamic JSON path introspection, improved refreshable materialized views, and a new min_max statistics type.
This episode explains how to optimize ClickHouse performance for fast and reliable analytics. The podcast focuses on ClickHouse's unique features, such as columnar storage, full-fledged DBMS capabilities, and distributed architecture, as crucial elements for achieving optimal performance. The hosts dive deep into four specific optimization strategies: designing the database schema, optimizing queries, implementing data distribution and replication, and leveraging tiered storage. Want to learn more? https://www.propeldata.com/blog/clickhouse-performance-optimizations-explained