Bringing scalable real-time analytics to the enterprise
Description
In this episode of the Data Show, I spoke with Dhruba Borthakur (co-founder and CTO) and Shruti Bhat (SVP of Product) of Rockset, a startup focused on building solutions for interactive data science and live applications. Borthakur was the founding engineer of HDFS and creator of RocksDB, while Bhat is an experienced product and marketing executive focused on enterprise software and data products. Their new startup is focused on a few trends I’ve recently been thinking about, including the re-emergence of real-time analytics, and the hunger for simpler data architectures and tools. Borthakur exemplifies the need for companies to continually evaluate new technologies: while he was the founding engineer for HDFS, these days he mostly works with object stores like S3.
We had a great conversation spanning many topics, including:
- RocksDB, an open source, embeddable key-value store originated by Facebook, and which is used in several other open source projects.
- Time-series databases.
- The importance of having solutions for real-time analytics, particularly now with the renewed interest in IoT applications and rollout of 5G technologies.
- Use cases for Rockset’s technologies—and more generally, applications of real-time analytics.
- The Aggregator Leaf Tailer architecture as an alternative to the Lambda architecture.
- Building data infrastructure in the cloud.
<figure><figcaption>The Aggregator Leaf Tailer (“CQRS for the data world”): A data architecture favored by web-scale companies. Source: Dhruba Borthakur, used with permission.</figcaption></figure>
Related resources:
- Serverless Streaming Architectures & Algorithms for the Enterprise – a new tutorial on September 24th at Strata Data NYC.
- “Becoming a machine learning company means investing in foundational technologies”
- Haoyuan Li: “In the age of AI, fundamental value resides in data”
- Harish Doddi: “Simplifying machine learning lifecycle management”
- Eric Jonas: “A Berkeley view on serverless computing”
- “Specialized tools for machine learning development and model governance are becoming essential”
- Avner Braaverman: “What data scientists and data engineers can do with current generation serverless technologies”