The AI Workflow

The AI Workflow

Update: 2025-09-02
Share

Description

Join Lois Houston and Nikita Abraham as they chat with Yunus Mohammed, a Principal Instructor at Oracle University, about the key stages of AI model development. From gathering and preparing data to selecting, training, and deploying models, learn how each phase impacts AI’s real-world effectiveness. The discussion also highlights why monitoring AI performance and addressing evolving challenges are critical for long-term success.
 
 
Oracle University Learning Community: https://education.oracle.com/ou-community
 
 
 
Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode.
 
--------------------------------------------------------------
 
Episode Transcript:

00:00

Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!

00:25

Lois: Welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services.

Nikita: Hey everyone! In our last episode, we spoke about generative AI and gen AI agents. Today, we’re going to look at the key stages in a typical AI workflow. We’ll also discuss how data quality, feedback loops, and business goals influence AI success. With us today is Yunus Mohammed, a Principal Instructor at Oracle University. 

01:00

Lois: Hi Yunus! We're excited to have you here! Can you walk us through the various steps in developing and deploying an AI model? 

Yunus: The first point is the collect data. We gather relevant data, either historical or real time. Like customer transactions, support tickets, survey feedbacks, or sensor logs. A travel company, for example, can collect past booking data to predict future demand. So, data is the most crucial and the important component for building your AI models.

But it's not just the data. You need to prepare the data. In the prepared data process, we clean, organize, and label the data. AI can't learn from messy spreadsheets. We try to make the data more understandable and organized, like removing duplicates, filling missing values in the data with some default values or formatting dates. All these comes under organization of the data and give a label to the data, so that the data becomes more supervised.

After preparing the data, I go for selecting the model to train. So now, we pick what type of model fits your goals. It can be a traditional ML model or a deep learning network model, or it can be a generative model. The model is chosen based on the business problems and the data we have.

So, we train the model using the prepared data, so it can learn the patterns of the data. Then after the model is trained, I need to evaluate the model. You check how well the model performs. Is it accurate? Is it fair? The metrics of the evaluation will vary based on the goal that you're trying to reach.

If your model misclassifies emails as spam and it is doing it very much often, then it is not ready. So I need to train it further. So I need to train it to a level when it identifies the official mail as official mail and spam mail as spam mail accurately. 

After evaluating and making sure your model is perfectly fitting, you go for the next step, which is called the deploy model. Once we are happy, we put it into the real world, like into a CRM, or a web application, or an API. So, I can configure that with an API, which is application programming interface, or I add it to a CRM, Customer Relationship Management, or I add it to a web application that I've got.

Like for example, a chatbot becomes available on your company's website, and the chatbot might be using a generative AI model. Once I have deployed the model and it is working fine, I need to keep track of this model, how it is working, and need to monitor and improve whenever needed. So I go for a stage, which is called as monitor and improve.

So AI isn't set in and forget it. So over time, there are lot of changes that is happening to the data. So we monitor performance and retrain when needed. An e-commerce recommendation model needs updates as there might be trends which are shifting. 

So the end user finally sees the results after all the processes. A better product, or a smarter service, or a faster decision-making model, if we do this right. That is, if we process the flow perfectly, they may not even realize AI is behind it to give them the accurate results. 

04:59

Nikita: Got it. So, everything in AI begins with data. But what are the different types of data used in AI development? 

Yunus: We work with three main types of data: structured, unstructured, and semi-structured. Structured data is like a clean set of tables in Excel or databases, which consists of rows and columns with clear and consistent data information.
Unstructured is messy data, like your email or customer calls that records videos or social media posts, so they all comes under unstructured data. 

Semi-structured data is things like logs on XML files or JSON files. Not quite neat but not entirely messy either. So they are, they are termed semi-structured. So structured, unstructured, and then you've got the semi-structured.

05:58

Nikita: Ok… and how do the data needs vary for different AI approaches? 

Yunus: Machine learning often needs labeled data. Like a bank might feed past transactions labeled as fraud or not fraud to train a fraud detection model. But machine learning also includes unsupervised learning, like clustering customer spending behavior. Here, no labels are needed.

In deep learning, it needs a lot of data, usually unstructured, like thousands of loan documents, call recordings, or scan checks. These are fed into the models and the neural networks to detect and complex patterns.

Data science focus on insights rather than the predictions. So a data scientist at the bank might use customer relationship management exports and customer demographies to analyze which age group prefers credit cards over the loans.

Then we have got generative AI that thrives on diverse, unstructured internet scalable data. Like it is getting data from books, code, images, chat logs. So these models, like ChatGPT, are trained to generate responses or mimic the styles and synthesize content.

So generative AI can power a banking virtual assistant trained on chat logs and frequently asked questions to answer customer queries 24/7.

07:35

Lois: What are the challenges when dealing with data? 

Yunus: Data isn't just about having enough. We must also think about quality. Is it accurate and relevant? Volume. Do we have enough for the model to learn from? And is my data consisting of any kind of unfairly defined structures, like rejecting more loan applications from a certain zip code, which actually gives you a bias of data?

And also the privacy. Are we handling personal data responsibly or not? Especially data which is critical or which is regulated, like the banking sector or health data of the patients. Before building anything smart, we must start smart. 

08:23

Lois: So, we’ve established that collecting the right data is non-negotiable for success. Then comes preparing it, right? 

Yunus: This is arguably the most important part of any AI or data science project. Clean data leads to reliable predictions. Imagine you have a column for age, and someone accidentally entered an age of like 999. That's likely a data entry error. Or maybe a few rows have missing ages. So we either fix, remove, or impute such issues.

This step ensures our model isn't misled by incorrect values. Dates are often stored in different formats. For instance, a date, can be stored as the month and the day values, or it can be stored in some places as day first and month next. We want to bring everything into a consistent, usable format. This process is called as transformation.

The machine learning models can get confused if one feature, like example the income ranges from 10,000 to 100,000, and another, like the number of kids, range from 0 to 5. So we normalize or scale values to bring them to a similar range, say 0 or 1. So we actually put it as yes or no options.

So models don't understand words like small, medium, or large. We convert them into numbers using encoding. One simple way is assigning 1, 2, and 3 respectively. And then you have got removing stop words like the punctuations, et cetera, and break the sentence into smaller meaningful units called as tokens. This is actually used for generative AI tasks.

In deep learning, especially for Gen AI, image or audio inputs must be of uniform size and format. 

10:31

Lois: And does each AI system have a different way of preparing data? 

Yunus: For machine learning ML, focus is on cleaning, encoding, and scaling. Deep learning needs resi

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

The AI Workflow

The AI Workflow