DiscoverOracle University PodcastCore AI Concepts – Part 2
Core AI Concepts – Part 2

Core AI Concepts – Part 2

Update: 2025-08-19
Share

Description

In this episode, Lois Houston and Nikita Abraham continue their discussion on AI fundamentals, diving into Data Science with Principal AI/ML Instructor Himanshu Raj. They explore key concepts like data collection, cleaning, and analysis, and talk about how quality data drives impactful insights.
 
 
Oracle University Learning Community: https://education.oracle.com/ou-community
 
 
 
Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode.
----------------------------------------------------------------
Episode Transcript:

00:00

Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!

00:25

Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Team Lead: Editorial Services. 

Nikita: Hi everyone! Last week, we began our exploration of core AI concepts, specifically machine learning and deep learning. I’d really encourage you to go back and listen to the episode if you missed it.  

00:52

Lois: Yeah, today we’re continuing that discussion, focusing on data science, with our Principal AI/ML Instructor Himanshu Raj. 

Nikita: Hi Himanshu! Thanks for joining us again. So, let’s get cracking! What is data science? 

01:06

Himanshu: It's about collecting, organizing, analyzing, and interpreting data to uncover valuable insights that help us make better business decisions. Think of data science as the engine that transforms raw information into strategic action. 

You can think of a data scientist as a detective. They gather clues, which is our data. Connect the dots between those clues and ultimately solve mysteries, meaning they find hidden patterns that can drive value. 

01:33

Nikita: Ok, and how does this happen exactly? 

Himanshu: Just like a detective relies on both instincts and evidence, data science blends domain expertise and analytical techniques. First, we collect raw data. Then we prepare and clean it because messy data leads to messy conclusions. Next, we analyze to find meaningful patterns in that data. And finally, we turn those patterns into actionable insights that businesses can trust. 

02:00

Lois: So what you’re saying is, data science is not just about technology; it's about turning information into intelligence that organizations can act on. Can you walk us through the typical steps a data scientist follows in a real-world project? 

Himanshu: So it all begins with business understanding. Identifying the real problem we are trying to solve. It's not about collecting data blindly. It's about asking the right business questions first. And once we know the problem, we move to data collection, which is gathering the relevant data from available sources, whether internal or external. 

Next one is data cleaning. Probably the least glamorous but one of the most important steps. And this is where we fix missing values, remove errors, and ensure that the data is usable. Then we perform data analysis or what we call exploratory data analysis. 

Here we look for patterns, prints, and initial signals hidden inside the data. After that comes the modeling and evaluation, where we apply machine learning or deep learning techniques to predict, classify, or forecast outcomes. Machine learning, deep learning are like specialized equipment in a data science detective's toolkit. Powerful but not the whole investigation. 

We also check how good the models are in terms of accuracy, relevance, and business usefulness. Finally, if the model meets expectations, we move to deployment and monitoring, putting the model into real world use and continuously watching how it performs over time. 

03:34

Nikita: So, it’s a linear process? 

Himanshu: It's not linear. That's because in real world data science projects, the process does not stop after deployment. Once the model is live, business needs may evolve, new data may become available, or unexpected patterns may emerge. 

And that's why we come back to business understanding again, defining the questions, the strategy, and sometimes even the goals based on what we have learned. In a way, a good data science project behaves like living in a system which grows, adapts, and improves over time. Continuous improvement keeps it aligned with business value.  

Now, think of it like adjusting your GPS while driving. The route you plan initially might change as new traffic data comes in. Similarly, in data science, new information constantly help refine our course. The quality of our data determines the quality of our results.  

If the data we feed into our models is messy, inaccurate, or incomplete, the outputs, no matter how sophisticated the technology, will be also unreliable. And this concept is often called garbage in, garbage out. Bad input leads to bad output. 

Now, think of it like cooking. Even the world's best Michelin star chef can't create a masterpiece with spoiled or poor-quality ingredients. In the same way, even the most advanced AI models can't perform well if the data they are trained on is flawed. 

05:05

Lois: Yeah, that's why high-quality data is not just nice to have, it’s absolutely essential. But Himanshu, what makes data good?  

Himanshu: Good data has a few essential qualities. The first one is complete. Make sure we aren't missing any critical field. For example, every customer record must have a phone number and an email. It should be accurate. The data should reflect reality. If a customer's address has changed, it must be updated, not outdated. Third, it should be consistent. Similar data must follow the same format. Imagine if the dates are written differently, like 2024/04/28 versus April 28, 2024. We must standardize them.  

Fourth one. Good data should be relevant. We collect only the data that actually helps solve our business question, not unnecessary noise. And last one, it should be timely. So data should be up to date. Using last year's purchase data for a real time recommendation engine wouldn't be helpful. 

06:13

Nikita: Ok, so ideally, we should use good data. But that’s a bit difficult in reality, right? Because what comes to us is often pretty messy. So, how do we convert bad data into good data? I’m sure there are processes we use to do this. 

Himanshu: First one is cleaning. So this is about correcting simple mistakes, like fixing typos in city names or standardizing dates. 

The second one is imputation. So if some values are missing, we fill them intelligently, for instance, using the average income for a missing salary field. Third one is filtering. In this, we remove irrelevant or noisy records, like discarding fake email signups from marketing data. The fourth one is enriching. We can even enhance our data by adding trusted external sources, like appending credit scores from a verified bureau. 

And the last one is transformation. Here, we finally reshape data formats to be consistent, for example, converting all units to the same currency. So even messy data can become usable, but it takes deliberate effort, structured process, and attention to quality at every step. 

07:26

Oracle University’s Race to Certification 2025 is your ticket to free training and certification in today’s hottest technology. Whether you’re starting with Artificial Intelligence, Oracle Cloud Infrastructure, Multicloud, or Oracle Data Platform, this challenge covers it all! Learn more about your chance to win prizes and see your name on the Leaderboard by visiting education.oracle.com/race-to-certification-2025. That’s education.oracle.com/race-to-certification-2025.

08:10

Nikita: Welcome back! Himanshu, we spoke about how to clean data. Now, once we get high-quality data, how do we analyze it? 

Himanshu: In data science, there are four primary types of analysis we typically apply depending on the business goal we are trying to achieve. 

The first one is descriptive analysis. It helps summarize and report what has happened. So often using averages, totals, or percentages. For example, retailers use descriptive analysis to understand things like what was the average customer spend last quarter? How did store foot traffic trend across months? 

The second one is diagnostic analysis. Diagnostic analysis digs deeper into why something happened. For example, hospitals use this type of analysis to find out, for example, why a certain department has higher patient readmission rates. Was it due to staffing, post-treatment care, or patient demographics? 

The third one is predictive analysis. Predictive analysis looks forward, trying to forecast future outcomes based on historical patterns. For example, energy companies predict future electricity demand, so they can better m

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Core AI Concepts – Part 2

Core AI Concepts – Part 2