HeatWave Hot Takes: The Power of ML and GenAI
Description
In this episode, leFred and Scott welcome Jayant Sharma and Sanjay Jinturkar to the Sakila Studio for an insightful conversation on machine learning and generative AI within HeatWave. Discover how these cutting-edge technologies are integrated, what makes HeatWave unique, and how organizations can leverage its capabilities to unlock new possibilities in data and AI. Tune in for practical insights, real-world use cases, and a closer look at the future of analytics.
------------------------------------------------------------
Episode Transcript:
00:00:00 :00 - 00:00:32 :01
Welcome to Inside MySQL: Sakila Speaks. A podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite open source database. Let's get started!
00:00:32 :03 - 00:00:54 :17
Hello and welcome to Sakila Speaks, the podcast dedicated to MySQL. I am leFred and I'm Scott Stroz. Today for the second episode of season three dedicated on AI. I am pleased to welcome Sanjay Jinturkar. Sorry if I pronounce it badly. No, you did it right. Hi there. Thank you. So Sanjay is the senior director at Oracle based in New Jersey.
00:00:54 :19 - 00:01:21 :13
He leads product development for it with AutoML and GenAI with a strong focus on integrating these technologies directly into each HeatWave database. And Sanjay has been instrumental in enhancing HeatWave's machine learning and GenAI tool sets, enabling use case like predictive maintenance, fraud detection and intelligent dicument and Q&A. And also we have a second guest today.
00:01:21 :13 - 00:01:48 :21
It's a Jayant Sharma. Hi, Jayant. Hello. So Jayant Sharma is senior director of product management at Oracle. He has over 20 years of experience in databases, spatial analytics and application development. He's currently focused on the product strategy and design of the Heatwave MySQL managed services offering. Hey Fred. Thank you, both of you for joining us today. So I'm going to dive right in with the question for Jayant.
00:01:48 :23 - 00:02:12 :14
Why did Oracle decide to integrate machine learning in generative AI capabilities directly into HeatWave? Thank you Scott, first for this opportunity. And yes, we have to start with first, you know, talking about MySQL, right? MySQL is the world's most popular open source database. And what do all of these customers, the thousands of customers that they have, do with it?
00:02:12 :16 - 00:02:47 :05
They manage a business process. They manage their enterprise, right? Their focus is on what they want to do, why they want to do it, and not so much the how. That's what MySQL makes it easier. And Heatwave is a managed service on MySQL. Okay, so as folks are modernizing their applications, taking advantage of new technology, they want to be able to use new workloads, new analytics, and modernize their business processes, make it more efficient, make it more effective.
00:02:47 :07 - 00:03:09 :17
In order to do that, they want to do things such as machine learning and use the benefits of generative AI. However, what they want to focus on, as we said, is what they want, why they want to do it and not the how. So they don't want to have to think about. I have all of this data that's potentially a goldmine.
00:03:09 :19 - 00:03:40 :07
How do I extract nuggets from it, and how do I safely move it and transfer in between the best of breed tools? I want to be able to do things where they are. I want to bring the capabilities, these new capabilities to my data. I don't want to take my data to where those capabilities are exposed, right? That is why we made it possible to do machine learning and GenAI where your gold mine is, where your data is in MySQL in Heatwave.
00:03:40 :09 - 00:04:06 :07
Awesome. Thank you. So, I would like to ask you to Sanjay, then. How Do the the, machine learning engine in the HeatWave, offer differ from, using external machine learning pipelines with the with the data we have in the database? It differs in a couple of weeks, specifically how the models are built, who builds them and where they are built.
00:04:06 :09 - 00:04:46 :09
So our pipeline, we provide, automated pipeline, which can take your data in MySQL database or Lakehouse, and then automatically generate the model for you. So it does the, usual tasks of pre-processing, hyperparameter optimization, and, data cleansing, etc. automatically so that the user doesn't have to do that. We would even go ahead and do, explanations for you in certain use cases, given that this is automated, a big side effect of that is users don't need to be experts in machine learning.
00:04:46 :11 - 00:05:16 :08
What they need to focus on is their business problem, and how that business problem maps onto one of the features that we provide. From there onwards, the pipeline takes over and generates the models for it. And the third piece is that all of this work is done within HeatWave. We don't take the data going back to what Jayant was say, saying, we have got machine learning and generative AI to where the data resides, not the other way around.
00:05:16 :10 - 00:05:47 :20
So we are building the models inside Heatwave whereby the data is not taken out and thereby it is more secure and the user does not have to worry about data leakage or track where all they have taken the data and how many times they have done it. So these are the three key ways in which we differ. If you use one of the third party solutions, they will end up asking you to do this on your own or asking you to take the data out of the database and build it on your machine, so on and so forth.
00:05:47 :22 - 00:06:21 :06
But we have made it automated, easy to use and very secure to do so. So Sanjay, we're going to stay with you to, to keep talking about AutoML in HeatWave. So what are some of the key features of AutoML and how does it simplify model training and deployment for users? Fantastic question. You know, as I said in my in the previous, conversation, we are hitting the common tasks that are associated with model training and deployment.
00:06:21 :08 - 00:06:46 :03
So let's take training here. Typically when the user has to train a model, they are going to take their data. They will clean it up, do some pre-processing. Then they will figure out which particular algorithm they should be using. Tune those algorithms in doing the hyperparameter tuning, so on and so forth. All of these are individual tasks.
00:06:46 :05 - 00:07:12 :15
Our goal is to have the user focus on their business problem and take away the engineering piece of it, take away the technology piece of it, and do it automatically for them. So we have this pipeline which does this, all of it, all of it automatically in a single pass. So it will do pre-processing. It's going to figure out, the appropriate algorithm to use during model building.
00:07:12 :17 - 00:07:39 :05
It will figure out what are the best set of hyperparameters and what their values should be, during the training process and give you the, the model. So that's one part the second part is we provide an ability to deploy these models via REST interfaces. So once the model is trained they can deploy this.
00:07:39 :07 - 00:08:09 :09
And thirdly from time to time the users data is going to drift. Or what I mean by that is the train model. The data on which it was trained no longer reflects the reality. And in that case, you have to retrain the model. So we provide tools to measure that drift. And if it goes beyond a certain threshold, then you can go ahead and retrain your model automatically.
00:08:09 :11 - 00:08:53 :01
So these are a couple of ways in which we have simplified the model training and the deployment for users. Thank you. Thank you very much for this, detailed, answer. And now... So as we discussed about, you know, the, the data not leaving, to a third party, product. But I would like to, to ask, to, Jayant, if, if there were some performance improvement that, users have seen by doing this, ML natively in HeatWave, instead of removing the data, to external platforms. Certainly, Fred.
00:08:53 :03 - 00:09:24 :01
So there are two aspects to this. There's, there are efficiencies that, result and there are performance improvement because of the way AutoML is implemented and how it works in HeatWave. Let's start with the efficiency first. The first thing as Sanjay was talking about right, is that we've automated the pipeline. You have to only focus on what is your business problem and how that maps to a particular task in machine learning.
00:09:24 :01 - 00:09:47 :04
So for example, do I want to predict something. And therefore use regression, do I want to identify or label something and therefore use classification. And AutoML will figure out which particular algorithm. There are multiple ways in which you may do regression, for example, which particular one applies or is best suited for the task at hand. Right.
00:09:47 :04 - 00:10:15 :06
So efficiency there is AutoML handles it in a single pass, not the normal process requires you to have an iterative do things multiple times. Try it on multiple algorithms or different ways of solving the same problem, and then evaluate which one does it best. AutoML does this in a single pass by. Very smart ways of sampling your data and running quick tests to identify the best approach.
00:10:15 :08 - 00:10:35 :15
So that's the efficiency. The second when it does this, why is it so fast? It's so fast because it uses it the full capability of the underlying infrastructure, which is the HeatWave nodes. Right.






