Defining a Database with Tony Baer
Tony Baer, Principal at dbInsight, joins Corey on Screaming in the Cloud to discuss his definition of what is and isn’t a database, and the trends he’s seeing in the industry. Tony explains why it’s important to try and have an outsider’s perspective when evaluating new ideas, and the growing awareness of the impact data has on our daily lives. Corey and Tony discuss the importance of working towards true operational simplicity in the cloud, and Tony also shares why explainability in generative AI is so crucial as the technology advances.
Tony Baer, the founder and CEO of dbInsight, is a recognized industry expert in extending data management practices, governance, and advanced analytics to address the desire of enterprises to generate meaningful value from data-driven transformation. His combined expertise in both legacy database technologies and emerging cloud and analytics technologies shapes how clients go to market in an industry undergoing significant transformation.
During his 10 years as a principal analyst at Ovum, he established successful research practices in the firm’s fastest growing categories, including big data, cloud data management, and product lifecycle management. He advised Ovum clients regarding product roadmap, positioning, and messaging and helped them understand how to evolve data management and analytic strategies as the cloud, big data, and AI moved the goal posts. Baer was one of Ovum’s most heavily-billed analysts and provided strategic counsel to enterprises spanning the Fortune 100 to fast-growing privately held companies.
With the cloud transforming the competitive landscape for database and analytics providers, Baer led deep dive research on the data platform portfolios of AWS, Microsoft Azure, and Google Cloud, and on how cloud transformation changed the roadmaps for incumbents such as Oracle, IBM, SAP, and Teradata. While at Ovum, he originated the term “Fast Data” which has since become synonymous with real-time streaming analytics.
Baer’s thought leadership and broad market influence in big data and analytics has been formally recognized on numerous occasions. Analytics Insight named him one of the 2019 Top 100 Artificial Intelligence and Big Data Influencers. Previous citations include Onalytica, which named Baer as one of the world’s Top 20 thought leaders and influencers on Data Science; Analytics Week, which named him as one of 200 top thought leaders in Big Data and Analytics; and by KDnuggets, which listed Baer as one of the Top 12 top data analytics thought leaders on Twitter. While at Ovum, Baer was Ovum’s IT’s most visible and publicly quoted analyst, and was cited by Ovum’s parent company Informa as Brand Ambassador in 2017. In raw numbers, Baer has 14,000 followers on Twitter, and his ZDnet “Big on Data” posts are read 20,000 – 30,000 times monthly. He is also a frequent speaker at industry conferences such as Strata Data and Spark Summit.
- dbInsight: https://dbinsight.io/
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: This episode is brought to us in part by our friends at RedHat.As your organization grows, so does the complexity of your IT resources. You need a flexible solution that lets you deploy, manage, and scale workloads throughout your entire ecosystem. The Red Hat Ansible Automation Platform simplifies the management of applications and services across your hybrid infrastructure with one platform. Look for it on the AWS Marketplace.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Back in my early formative years, I was an SRE sysadmin type, and one of the areas I always avoided was databases, or frankly, anything stateful because I am clumsy and unlucky and that’s a bad combination to bring within spitting distance of anything that, you know, can’t be spun back up intact, like databases. So, as a result, I tend not to spend a lot of time historically living in that world. It’s time to expand horizons and think about this a little bit differently. My guest today is Tony Baer, principal at dbInsight. Tony, thank you for joining me.
Tony: Oh, Corey, thanks for having me. And by the way, we’ll try and basically knock down your primal fear of databases today. That’s my mission.
Corey: We’re going to instill new fears in you. Because I was looking through a lot of your work over the years, and the criticism I have—and always the best place to deliver criticism is massively in public—is that you take a very conservative, stodgy approach to defining a database, whereas I’m on the opposite side of the world. I contain information. You can ask me about it, which we’ll call querying. That’s right. I’m a database.
But I’ve never yet found myself listed in any of your analyses around various database options. So, what is your definition of databases these days? Where do they start and stop?
Tony: Oh, gosh.
Corey: Because anything can be a database if you hold it wrong.
Tony: [laugh]. I think one of the last things I’ve ever been called as conservative and stodgy, so this is certainly a way to basically put the thumbtack on my share.
Corey: Exactly. I’m trying to normalize my own brand of lunacy, so we’ll see how it goes.
Tony: Exactly because that’s the role I normally play with my clients. So, now the shoe is on the other foot. What I view a database is, is basically a managed collection of data, and it’s managed to the point where essentially, a database should be transactional—in other words, when I basically put some data in, I should have some positive information, I should hopefully, depending on the type of database, have some sort of guidelines or schema or model for how I structure the data. So, I mean, database, you know, even though you keep hearing about unstructured data, the fact is—
Corey: Schemaless databases and data stores. Yeah, it was all the rage for a few years.
Tony: Yeah, except that they all have schemas, just that those schemaless databases just have very variable schema. They’re still schema.
Corey: A question that I have is you obviously think deeply about these things, which should not come as a surprise to anyone. It’s like, “Well, this is where I spend my entire career. Imagine that. I might think about the problem space a little bit.” But you have, to my understanding, never worked with databases in anger yourself. You don’t have a history as a DBA or as an engineer—
Corey: —but what I find very odd is that unlike a whole bunch of other analysts that I’m not going to name, but people know who I’m talking about regardless, you bring actual insights into this that I find useful and compelling, instead of reverting to the mean of well, I don’t actually understand how any of these things work in reality, so I’m just going to believe whoever sounds the most confident when I ask a bunch of people about these things. Are you just asking the right people who also happen to sound confident? But how do you get away from that very common analyst trap?
Tony: Well, a couple of things. One is I purposely play the role of outside observer. In other words, like, the idea is that if basically an idea is supposed to stand on its own legs, it has to make sense. If I’ve been working inside the industry, I might take too many things for granted. And a good example of this goes back, actually, to my early days—actually this goes back to my freshman year in college where I was taking an organic chem course for non-majors, and it was taught as a logic course not as a memorization course.
And we were given the option at the end of the term to either, basically, take a final or do a paper. So, of course, me being a writer I thought, I can BS my way through this. But what I found—and this is what fascinated me—is that as long as certain technical terms were defined for me, I found a logic to the way things work. And so, that really informs how I approach databases, how I approach technology today is I look at the logic on how things work. That being said, in order for me to understand that, I need to know twice as much as the next guy in order to be able to speak that because I just don’t do this in my sleep.
Corey: That goes a big step toward, I guess, addressing a lot of these things, but it also feels like—and maybe this is just me paying closer attention—that the world of databases and data and analytics have really coalesced or emerged in a very different way over the past decade-ish. It used to be, at least from my perspective, that oh, that the actual, all the data we store, that’s a storage admin problem. And that was about managing NetApps and SANs and the rest. And then you had the database side of it, which functionally from the storage side of the world was just a big file or series of files that are the backing store for the database. And okay, there’s not a lot of cross-communication going on there.
Then with the rise of object store, it started