Database Sharding: Part 2
Description
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! In our last episode, we dove into database sharding and Oracle Database Sharding in particular. If you haven’t listened to it yet, I’d suggest you go back and do so before you listen to this episode because it will give you a lot of context.
Lois: Right, Niki. Today, we will discuss all the 23ai new features related to database sharding. We will cover sharding native replication, directory-based sharding, coordinated backup and restore for sharded databases, and a few more.
Nikita: And we’re so happy to have Ron Soltani back on the podcast. If you don’t already know him, Ron is a Senior Principal Database & Security Instructor with Oracle University. Hi Ron! Let’s talk about sharding native replication, which is RAFT-based, meaning that it is reliable and fault tolerant-based, usually providing subzero or subsecond zero data loss replication support. Tell us more about it, please.
Ron: This is completely transparent replication built in within Oracle sharding that duplicates data across the different shards. So data are generally put into chunks. And then the chunks are replicated either between three or five different shards, depending on how much of the fault tolerance is required.
This is completely provided by the Oracle sharding database, and does not require use of any other component like GoldenGate and Data Guard. So if you remember when we talked about the architecture, we said that each shard, each database can have a Data Guard component, whether through GoldenGate or whether through Data Guard to have a standby.
And that way support high availability with the sharding native replication, you don't rely on the secondary database. You actually-- the shards will back each other up by holding replicas and being able to globally manage the replica, make sure everything is preserved, and manage all of the fault operations.
Now this is a logical replication, generally consensus-based, kind of like different components all aware of each other. They know which component is good, depending on the load, depending on the failure. The sharded databases behind the scene decide who is actually serving the data to the client. That can provide subsecond failovers with zero data loss.
Lois: And what are the benefits of this?
Ron: Major benefits for having sharding native replication is that it is completely transparent to the application or any of the structures. You just identify that you want to go ahead and use this replication and identify the replication factor. The rest is managed by the Oracle sharded database behind the scene.
It supports fast failover with zero data loss, usually subsecond failovers. And depending on the number of replicas, it can even tolerate multiple failures like two server failures.
And when the loads are submitted, the loads are also load-balanced across all of these shards based on where the data is located, based on the replicas. So this way, it can also provide you with a little bit of a better utilization of the hardware and load administration.
So generally, it's designed to help you keep your regular SQL-based databases without having to resolve to FauxSQL or NoSQL environment getting into other databases.
Nikita: So next is directory-based sharding. Can you tell us what directory-based sharding is, Ron?
Ron: Directory-based sharding basically allows the user to define the values that are used and combined for different partition, so better control, location of the data, in what partition, what shard. So this allows you to set up a good configuration.
Now, many times we may have a key that may not be large enough for hash partitioning to distribute the data enough. Sometimes we may not even know what keys are going to come in the future. And these need to be built in the future. So having to build these, you really don't want to have to go reorganize the whole data based on new hash functions, and so when data cannot be managed and distributed using hash partitioning or when we need full control over combination of where data exists.
Lois: Can you give us a practical example of how this works?
Ron: So let's say our company is very small in three different countries. So I can combine those three countries into one single shard. And then have three other big countries, each one sitting in their own individual shards. So all of this done through this directory-based sharding. However, what is good about this is the directory is created, which is a table, created behind the scene, stored in the catalog, available to the client that is cached with them, used for connection mapping, used for data access. So it can give you a lot of very high-level benefits.
Nikita: Speaking of benefits, what are the key advantages of using directory-based sharding?
Ron: First benefit allow you to group the data together based on the whatever values you want, depending on what location you want to put them as far as across the shards are concerned. So all of that is much better and easier controlled by us or by the designers. Now, this is when there is not enough values available. So when you're going to use hash-based partition, that would result into an uneven distribution of the data.
Therefore, we may be able to use this directory for better distribution of the data since we understand the data structure better than just the hash function. And having a specification where you can go ahead and create future component, future partitions, depending on how large they're going to be. Maybe you're creating them with an existing shard, later put them in another shard. So capability of having all of those controls become essential for management of this specific type of data.
If a shard value, the key value is required, for example, as we said, client getting too big or can use the key value, split it or get multiple key value. Combine them. Move data from one location to another. So all of these components maintain automatically behind the scene by us providing the changes. And then the directory sharding and then the sharded database manages all of the data structure, movement, everything behind the scene using some of the future functionalities.
And finally, large chunk of data, all of that can then be moved from one location to another. This is part of the automatic chunk data move and whatnot, but utilized within the directory-based sharding to allow us the control of this data and how we're going to move and manage the data based on the load as the load or the size of the data changes.
Lois: Ron, what is the purpose of the coordinated backup and restore system in Oracle Database Sharding?
Ron: So, basically when we talk about a coordinated backup and restore, remember in a sharded database, I have different databases. Each database is a shard. When you take a backup, each database creates its own backup.
So to have consistent data across all of the shards for the whole schema, it is extremely important for these databases to be coordinated when the backup is taken, when the restore is being done. So you have consistency of the data maintained across all of the shards.
Nikita: So, how does this coordination actually happen?
Ron: You don't submit this through our main. You submit this through the Global Management tool that is used for the sharded database. And it's the Global Management tool that is actually submit your request to each database, but maintains the consistency of when the actual backup is taken, what SCN.
So that SCN coordination ac