Oracle GoldenGate 23ai: The Extract Process
Description
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Team Lead: Editorial Services.
Nikita: Hi everyone! Last week, we spoke about installing GoldenGate and today, we’re diving into the Extract process. We’ve discussed it briefly in an earlier episode, but to recap, the Extract process captures changes from the source database and writes them to a trail file.
Lois: Joining us again is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. Hi Nick! Before we get into the Extract process, can you walk us through the different architecture options available for GoldenGate. Let’s start with when GoldenGate is installed on the same server as the source database. What are the benefits of this architecture?
Nick: There's a couple of advantages to this. It means that GoldenGate can use the same resources on that source database. It means that you don't need another host to support the GoldenGate environment. It also means that GoldenGate can use a bequeathed connection to connect from the Extract process into the source database to make it run faster.
The restrictions on this are that the Replicat process is highly communicative with the target database. What that really means is that the Replicat process is constantly doing lots of little transactions. And so the network latency between the Replicat process and the target database should really be around 4 milliseconds or less for optimal performance.
So that means that a lot of people can't really run GoldenGate on the source system, even though it's an option, because they need that Replicat latency performance. And so they'll often install GoldenGate on the same server as the target database. In this case, they can use the Replicat to connect using a bequeath connection to that target system, you know that it's going to be highly performant and that latency is not going to be an issue.
This works really well because the Extract process has actually been optimized to do remote capture. And so it's actually able to handle 80 milliseconds round trip ping time or less between the actual Extract process and the source database itself. And so a lot of customers will opt for this method, where they're actually running GoldenGate away from the target, or excuse me, away from the source database.
Nikita: Interesting. And is there an option where you don’t need to install GoldenGate on the actual source or target database?
Nick: We also have another architecture pattern called a Hub model. And this is what you would see in something like OCI GoldenGate or OCI Marketplace, or even in third party clouds environments where you don't have the ability to install GoldenGate on the actual source or target database. In these cases, GoldenGate is just going to run on a virtual machine or an environment that you have set up specifically for GoldenGate.
Now, this GoldenGate Hub doesn't need to have any database software installed. It doesn't need to have any database information on it. It's simply working as a client. So GoldenGate Extract process is a client connecting into the source database and the Replicat is a client connecting into the target database. And this really gives you a lot of flexibility.
However, in some cases, there may be too much of a distance, so you won't be able to get both less than 80 milliseconds on the source side in less than 4 milliseconds on the round trip on the target side. And so in that case, you can have multiple GoldenGate Hubs. And so you would have a Hub on the Extract side and another Hub on the Replicat side. And all these are fully accessible.
In this case, you'll actually use the distribution service to send the trail files from one system to another.
Lois: So, coming to the Extract process, what does it actually do?
Nick: The Extract process is configured to capture changes from that source database. In different terminology, it can subscribe to a topic if we're pulling data out of a Kafka queue or a topic or some messaging system like a JMS queue and relational database language, we're pulling database from the database transaction logs.
There's a lot of different sources and targets. You can always use the GoldenGate Certification Matrix to determine which sources and targets are supported, and where we can extract data from.
The capture process also connects to the source table for initial loads. When we do the initial load, instead of reading from the transaction logs, GoldenGate is actually going to do a select star on that table to get the information it needs for that load.
Lois: And what about the Extract process group?
Nick: The process group is kind of a grouping of the process itself, which is either going to be my Extract or Replicat and associated files. So in an Extract environment, we have our parameter file and a report file and our checkpoint files.
The parameter file, the .prm file, is going to list out which objects we're going to capture and how we're going to capture that data. It also controls what we're going to be writing to the trail file and where that trail file exists. The report file is really just a log of what's going on in that Extract process, how it's working, what tables it's encountered. It's used for any troubleshooting to make sure everything is running smoothly.
And then you also have the checkpoint files. The checkpoint files and report files should not be modified by the user, the parameter file can be. The checkpoint files are going to include information about where that process is reading from, where it's writing to, and any open transaction that it's tracking as part of the bounded recovery or cache manager functionality.
Nikita: How do you go about creating an Extract group?
Nick: The Extract group can be created by doing an Add Extract command or through the UI. Each Extract must also have a unique name. On the Extract process side, there is an eight-character hard limit for the name itself. And so, you can’t have an Extract process called my Extract for today is called Nick. More than eight characters.
Lois: Nick, I was wondering, is there a simple way to identify what an Extract or Replicat is doing?
Nick: If you need something to help identify what that Extract or Replicat is doing or the description of it, we do have a description field. So when you do the Add Extract or Add Replicat, there is a DESC field that allows you to add more details in. And this is really key because it allows you to put a lot more information that’s going to show up in all the log files at the service manager level.
And any time you do an info on the service it’ll also bring up that description field so you can see what’s going on. That way, if you get an alert, a watch, you need to keep track of something you can easily identify what that process is doing and what it’s replicating for.
Adopting a multicloud strategy is a big step towards future-proofing your business and we’re here to help you navigate this complex landscape. With our suite of courses, you'll gain insights into network connectivity, security protocols, and the considerations of working across different cloud platforms. Start your journey to multicloud today by visiting mylearn.oracle.com.
Nikita: Welcome back! Before the break, we were talking about the description field, which helps identify what the Extract is doing. Nick, are there any best practices to keep in mind when naming a group?
Nick: You also don't want to use any special characters when naming the group, especially you know things like slashes or dashes. You don't want to use spaces in them, just really stick to alphanumeric characters only.
The group names are also case insensitive, so EDEPT,