DiscoverOracle University PodcastOracle AI Vector Search: Part 1
Oracle AI Vector Search: Part 1

Oracle AI Vector Search: Part 1

Update: 2024-10-22
Share

Description

In this episode, Senior Principal APEX and Apps Dev Instructor Brent Dayley joins hosts Lois Houston and Nikita Abraham to discuss Oracle AI Vector Search. Brent provides an in-depth overview, shedding light on the brand-new vector data type, vector embeddings, and the vector workflow.
 
 
 
Oracle University Learning Community: https://education.oracle.com/ou-community
 
 
 
Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.
 
---------------------------------------------------------
 
Episode Transcript:
 

00:00

Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!

 

00:26

Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs here at Oracle University. Joining me as always is our Team Lead of our Editorial Services, Nikita Abraham.

Nikita: Hi everyone! Thanks for tuning in over the last few months as we’ve been discussing all the Oracle Database 23ai new features. We’re coming to the end of the season, and to close things off, in this episode and the next one, we’re going to be talking about the fundamentals of Oracle AI Vector Search. In today’s episode, we’ll try to get an overview of what vector search is, why Oracle Vector Search stands out, and dive into the new vector data type. We’ll also get insights into vector embedding models and the vector workflow.

01:11

Lois: To take us through all of this, we’re joined by Brent Dayley, who is a Senior Principal APEX and Apps Development Instructor with Oracle University. Hi Brent! Thanks for joining us today. Can you tell us about the new vector data type?

Brent: So this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data. Now, the vector data type allows a foundation to store vector embeddings.

01:42

Lois: And what are vector embeddings, Brent?

Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data. You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs. Embeddings can be used to represent almost any type of data, including text, audio, or visual, such as pictures. And they are used in proximity searches.

02:28

Nikita: Hmmm, proximity search. And similarity search, right? Can you break down what similarity search is and how it functions?

Brent: So vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space. What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance.

Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters.

03:44

Lois: Ok. I want to move on to vector embedding models. What are they and why are they valuable?

Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. It allows you to quantify features or dimensions. Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings.

04:33

Nikita: And what about visual data?

Brent: For visual data, you can use residual network also known as ResNet to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors.

05:02

Lois: Can you give us some examples of this, Brent?

Brent: Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions.

05:24

Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in Challenges. And stay up-to-date with upcoming certification opportunities. Visit mylearn.oracle.com to get started. 

05:50

Nikita: Welcome back! Let’s now get into the practical side of things. Brent, how do you import embedding models?

Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as Onyx.

Oracle Database implements an Onyx runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL.

<st

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Oracle AI Vector Search: Part 1

Oracle AI Vector Search: Part 1

Oracle Corporation