Oracle Database 23ai: Backup and Recovery - Part 2
Description
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi everyone! Last week, we had a fantastic chat with Bill Millar, our Senior Principal Database & MySQL Instructor. We dug into the basics of backup and recovery. We touched on everything from a DBA’s role in preventing data loss to handling different types of failures, and even some common mistakes that tend to pop up when managing a database.
Nikita: Yeah, if you missed that episode, definitely go back and check it out. It’s packed with useful info, especially if you’re in charge of keeping databases safe.
Lois: Today, we’re picking up where we left off. We’re going to ask Bill about instance recovery and recovery strategies. Bill, can you kick things off by explaining what instance recovery is?
Bill: You can understand instant recovery by becoming familiar with the checkpoint process, the redo log files, and the role of the log writer with the redo log files.
Automatically instance or crash recovery. What is it doing? What are the phases of instance recovery? How we possibly can tune that instance recovery. We can use the mean time to recovery advisor that can help us determine how we might tune the instance recovery.
Nikita: OK, so let’s go through some of these concepts and procedures you mentioned. What is the checkpoint process responsible for exactly?
Bill: The checkpoint process itself, it's responsible for updating the data file headers with checkpoint information. When a checkpoint is taken, it is going to write into the controlfiles. It tells the DB writer to write. DB writer writes to the data files, and the checkpoint is also annotated in the data files.
So updating controlfiles with that checkpoint information also, controlfiles and database files. It signals that DB writer at full check points again, hey, it's time to write. So that way, it has the latest data written to the data files. The controlfile and datafiles, those are in sync with that.
Lois: Bill, what about the log writer process and the redo log files?
Bill: With the log writer process and the redo log files, the redo log files record the changes to the database himself. It should be multiplexed.
Nikita: What do you mean by that?
Bill: More than one redo log group. Now, the redo log groups, it is recommended that they should be multiplexed. Each group member should be on a different disk or in a different disk group if you're using ASM.
Nikita: And why is that, Bill?
Bill: Because if I lose one, if I lose one redo log group, one member, I can continue to operate with just the one. If I only have one redo log group member and the system comes around and tries to write to it, then my system is going to come to a halt.
So the log writer is going to write to those redo logs whenever somebody does a commit. When that redo log buffer is 1/3 full or every three seconds and before DB writer writes. So those are the four mechanism that tells log writer to write from that log buffer to the redo log files. And it'll also write, when we do a shut down, all the buffers will be flushed. And so that way, everything will be in sync when the system is shut down.
Lois: What are the different modes of operation for a database, Bill? And how do these modes impact the recovery capabilities of the database?
Bill: So we have two different modes we can operate in. One is called NOARCHIVELOG mode. It is the default. ARCHIVELOG mode, highly encouraged. But not every environment has to be in ARCHIVELOG mode.
Nikita: So with ARCHIVELOG mode…
Bill: Closed database. You have to close it, recover to the last backup. That's as far as I can go. Actually, I could, depending on what happens, I might be able to apply some redo.
Suitable for training and test environments or for data warehouses, we don't have a lot of frequent changes. It's mainly bulk loading data at night and querying during the day. So it might be appropriate for that.
Because ARCHIVELOG mode, it is a little overhead. Yes. So with that database, it goes down while it's open. The system, when it comes up, it can recover to the last committed transaction. And this is usually the mode we want to operate in for production environments.
So we have that data in the buffer cache. We have that redo being buffered. We have the undo tablespace, keeping track of what the data was before a change. The redo keeps track of what was the change.
And if we're in ARCHIVELOG mode, as we switch from one redo log to another, we will generate what's referred to as archived log files, and that's what allows us to do a complete recovery.
Lois: What happens in the case of automatic instance recovery?
Bill: For an automatic instance recovery or crash recovery, our system went down unexpectedly. Because it did not do a clean shutdown, the buffers were not flushed. Everything was not synchronized. So the datafile, controlfile, everything is out of sync.
Nikita: So, how do the files get synchronized then?
Bill: It uses the redo log groups to synchronize the files. It's going to roll forward. It rolls forward the changes that were made. So due to different distinct operations. Roll forward applies committed and uncommitted data. And the redo does not keep track of what was committed and uncommitted.
It'll keep track of, hey, I had this transaction, hey, here's a commit for that transaction. But hey, I have a transaction. That was never uncommitted. That's the job of the undo. But rolls forward all those changes. And then anything that did not actually receive a commit, it will roll back the uncommitted data, return to the original state. And that is the job of the undo tablespace.
Lois: Bill, is it possible to tune instance recovery for better performance?
Bill: You can try to tune this instance recovery. Tuning it is touchy. Be careful because you can cause more harm than what you think you might be doing good. The instance recovery, what we're doing, we're trying to-- the transactions between checkpoints. When was the last checkpoint?
Because the items between the checkpoints, that's what has to be reapplied. So the last checkpoint to the last redo log, what is that time frame there between those? Well, what we're going to do, we're going to try to control that. We're going to try to control the difference between the checkpoint and the end of the redo log.
There is a mean-time recovery advisor. You specify the desired times in seconds or minutes that how often you want that checkpoint to occur.
There is a parameter, FAST_START_MTTR parameter that you can set. The default value is zero saying, hey, I'm going to let the system take care of it. And the maximum you can set it is to one hour.
Nikita: And why 1 hour?
Bill: The reason being, if I set that to one hour and I have a lot of activity, how long is it going to take? How many transactions can happen within that hour? Yeah, I'm not doing a checkpoint as often, so I'm eliminating that workload. But if it has to recover, how long is it going to take?
If I set it too small, the system says, hey, right now, it's going to take me 19 seconds based off statistics. If I said, OK, I want it in five seconds. So what does that mean? Every five seconds, I'm saying do a checkpoint. So what is it doing? OK, time to do a checkpoint.
OK, time to go ahead and OK, DB writer write. OK, log writer write. OK, let me update the datafiles and the controlfiles. So you're just thrashing your system. So be careful if you decide to try to manually tune it.
And when you go out and look at this mean time to recover, and even if you do it through the command line, you'll see that,