DiscoverOracle University PodcastDatabase Sharding: Part 1
Database Sharding: Part 1

Database Sharding: Part 1

Update: 2024-08-20
Share

Description

In this two-part episode, hosts Lois Houston and Nikita Abraham are joined by Ron Soltani, a Senior Principal Database & Security Instructor, to discuss the ins and outs of database sharding. In Part 1, they delve into the fundamentals of database sharding, including what it is and how it works, specifically looking at Oracle Database Sharding and its benefits. They also explore the architecture of a sharded database, examining components such as shards, shard catalogs, and shard directors.
 
 
Oracle University Learning Community: https://education.oracle.com/ou-community
 
 
 
Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.
 
--------------------------------------------------------
 
Episode Transcript:
 

00:00

Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!

00:26

Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.

Lois: Hi there! The last two weeks of the podcast have been dedicated to all things database security. We discussed why it’s so important and looked at all the new features related to database security that have been released in Oracle Database 23ai, previously known as 23c. 

00:55

Nikita: Today’s episode is also going to be the first of two parts, and we’re going to explore database sharding with Ron Soltani. Ron is a Senior Principal Database & Security Instructor with Oracle University. We’ll ask Ron about what database sharding is and then talk specifically about Oracle Database Sharding. We’ll look at the benefits of it and also discuss the architecture. 
Lois: All this will help us to prepare for next week’s episode when we dive into each 23ai new feature related to Oracle Database Sharding. So, let’s get to it. Hi Ron! What’s database sharding? 

01:32

Ron: This is basically an architecture to allow you to divide data for better computing and scaling across multiple environments instead of having a single system performing the work. So this allows you to do hyperscale computing and other different technologies that are included that will allow you to distribute your queries and all other requests across these multiple components to be able to get a very fast response. 
Now many times with this distributed segment across each kind of database that is called a shard allow you to have some geographical location component while you are not really sharing any of the servers or the components. So it allows you separation and data management for each of the shards separately. However, when it comes to the application, the sharded database is totally invisible. So as far as the application is concerned, they connect to a global service, submit their statements. Everything else is managed then by the sharded database underneath. 

With sharded tables, basically it gets distributed across each shard. Normally, this is done through horizontal partitioning. And then the data depending on the partitioning scheme will be distributed across like server A, server B, server C, which are independent servers that are running independent databases. 

03:18

Nikita: And what about Oracle Database Sharding specifically?

Ron: The Oracle Database Sharding allows you to automate how the data is distributed, replicated, and maintain the kind of a directory that defines the complete sharding scheme, while everything is distributed across many servers with no sharing whether the hardware or software. It allows you to have a very good scaling to be able to scale based on this partitioning across all of these independent servers. 

And based on the subset and the discrete data configuration, you can go ahead and distribute this data across these components where each shard is an independent data location or data component, a subset of data that can be used, whether individually on its own or globally across all of the shards together. And as we said to the application, the Oracle Database Sharding also looks as a single component. 

04:35

Lois: Ron, what are some of the benefits of Oracle Database Sharding?

Ron: With Oracle Database, you basically have linear scaling capability across as many shards as you like. And all of the different database configurations are supported with this. So you can have rack databases across the shards, Oracle Data Guard, GoldenGate. So all of the different components are still used to give you all of the high availability and every other kind of functionality that we generally used to having a single database with. 

It provides you with fault toleration. So each component could be down. It could have its own replicated data. It doesn't affect other location and availability of the data in those other locations. 

And finally, depending on data sovereignty and configuration, you could actually distribute data geographically across the different locations based on requirements and also data access to provide a higher speed for local data management. 

05:46

Lois: I’d like to understand more about the architecture of Oracle Database Sharding. Ron, can you first give us a broad overview of how Oracle Database Sharding is structured?

Ron: When it comes to dealing with Oracle Database architecture, the components include, first, your shards. The shards-- each one is an independent Oracle Database depending on the partitioning you decide on a partition key and then how the actual data is divided across those shards. 

06:18

Nikita: So, these shards are like separate pieces of the database puzzle…Ok. What’s next in the architecture?

Ron: Then you have shard catalog. Shard catalog is a catalog of your sharding configuration, is aware of all of the components in the shard, and any kind of replicated object that master object exists in the shard catalog to be maintained from there. 
And it also manages the global queries acting as a proxy. So queries can be distributed across multiple shards. The data from the shards returned back to the catalog to group together and then sent back to the client. 

Now, this shard catalog is basically another version of an Oracle Database that is created independently of the shards that include the actual data, and its job is to maintain this catalog functionality. 

07:19

Nikita: Got it. And what about the shard director?
Ron: The shard director is like another form of a global service manager. 

So it understands the sharding by being able to access the catalog, knows where everything exists. The client connection pool will hit the shard director. In general, communication and then whether it's being distributed to the shard catalog to be able to proxy it, or, if the key is available, then the director can send the query directly to the shard based on the key where the data exists. So the shard can then respond to the client directly. So all of the connection pool and the components for global administration, generally managed by the shard director. 

08:11

Nikita: Can we dive into each of these components in a little more detail? Let’s go backwards and start with the shard director.

Ron: The shard director, as we said, this is like a global service manager. It acts as a regional listener where all of the connection requests will be coming to the shard director and then distributed from that depending on the type of connection that is being used. 

Now the director understands the topology--maintains the complete understanding of the mapping of the data against the shards. And based on the shard key, if the request are specified on the specific key, it can then route the connection request directly to the shard that is appropriate where the data resides for the direct response. 

09:03

Lois: And what can you tell us about the shard catalog?

Ron: The shard catalog, this is another Oracle Database that is created for special purpose of holding the topology of the sharded database. And have all of the centralized information metadata about your sharded database. It also act as a proxy. 
So, if a client request comes in without providing a shard key, then the request would go to the catalog. It can be distributed to all of the shards. So the shards that you actually have the data can respond, but the data can then be combined and sent back to the client. So, it also creates the master copy of all the duplicate tables that are created in the shard database. 

09:56

Lois: Ok. I’ve got it. Now, let’s talk more about the shards themselves.

Ron: Each shard is basically a database. And data is horizontally partitioned to be placed on each of these sha

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Database Sharding: Part 1

Database Sharding: Part 1

Oracle Corporation