
A few days back, I set up Oracle GoldenGate Microservices on:
- Version: 21.12.0.0.0
- Source & Destination: Configured for one of the critical databases
This database is vital for real-time data synchronization, and Oracle GoldenGate is the preferred solution for such requirements. It’s a very popular product nowadays for data replication and synchronization across environments.
Personally, I appreciate and highly value Oracle GoldenGate for its robustness and capabilities in handling critical data replication scenarios.
Later, I decided to set up XAG (Oracle Grid Infrastructure Agent) to automate the failover of GoldenGate from Node 1 to another node without any human intervention. This was aimed at ensuring high availability of the GoldenGate services in case of any node failure.
I successfully added the configuration using the agctl utility and registered the replicat and extract processes without any issues. The GoldenGate services also started via XAG seamlessly.
Additionally, I tested the failover functionality, and it worked perfectly without any problems — the services failed over to the standby node as expected.
SHowever, suddenly after a few days, I noticed that I was unable to start the GoldenGate services via XAG, as it was throwing an error during startup.
Interestingly, when I started the GoldenGate services manually, they worked perfectly fine without any issues.
This indicated that the problem was specific to XAG’s integration with GoldenGate, not with GoldenGate itself.
I found the below XAG process in offline mode:
xag.dbtest
1 ONLINE OFFLINE. STABLE◀◀◀◀◀◀◀
xag.dbtest.goldengate
1 ONLINE OFFLINE STABLE◀◀◀◀◀◀◀
Then i searched in the XAGTASK.log and found the below :
2025-07-10 09:06:48.584 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [start] XAG HealthCheck after start returned 3
2025-07-10 09:06:48.684 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [start] execute XAGTask HealthCheck
2025-07-10 09:06:51.088 : AGENT:1086715648: [ NONE] {1:25643:56777} {1:25643:56777} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: xag.dbtest.goldengate 1 1
2025-07-10 09:06:51.088 :CLSDYNAM:1086715648: [xag.dbtest.goldengate]{1:25643:56777} [start] Killing action script: start
2025-07-10 09:06:51.088 : AGFW:1086715648: [ ERROR] {1:25643:56777} Command: start for resource: xag.dbtest.goldengate 1 1 has been aborted, updating resource state/label
2025-07-10 09:06:51.088 : AGFW:1084614400: [ INFO] {1:25643:56777} Processing aborted reply : start for resource: xag.dbtest.goldengate 1 1
2025-07-10 09:06:51.088 : AGFW:1084614400: [ INFO] {1:25643:56777} Agent sending reply for: RESOURCE_START[xag.dbtest.goldengate 1 1] ID 4098:1393127
2025-07-10 09:06:51.089 : AGFW:2343531454: [ INFO] {1:25643:56777} Aborted cmd entrypoint returned : start for resource: xag.dbtest.goldengate 1 1
2025-07-10 09:06:51.089 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] Executing action script: <GRID_HOME>/bin/aggoldengatescaas[check]
2025-07-10 09:06:51.244 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] GG agent running command 'check' on xag.dbtest.goldengate
2025-07-10 09:06:51.395 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] execute XAGTask HealthCheck
2025-07-10 09:06:53.098 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] XAGTask retcode 3
2025-07-10 09:06:53.099 : AGFW:1084614400: [ INFO] {1:25643:56777} xag.dbtest.goldengate 1 1 state changed from: STARTING to: FAILED◀◀◀◀◀◀◀
Based on the analysis of the crsd_scriptagent_oracle.trc trace file, I found the following error related to the XAG health check:
[xag.dbtest.goldengate]{1:25643:56777} [start] HealthCheck failed
[xag.dbtest.goldengate]{1:25643:56777} [start] Exiting with return code 3
[xag.dbtest.goldengate]{1:25643:56777} [start] XAG HealthCheck after start returned 3
[xag.dbtest.goldengate]{1:25643:56777} [start] execute XAGTask HealthCheck
On further analysis i got the below oracle doc:
In Oracle GoldenGate deployments using Microservices Architecture, the “Critical to deployment health” flag on Extract or Replicat processes is generally not recommended. Setting this flag can cause a single Extract or Replicat failure to bring down the entire GoldenGate deployment, and it can also prevent XAG (the GoldenGate Autostart/Autorestart feature) from restarting the processes
To resolve this, I disabled the “Critical to Deployment Health” flag from the GoldenGate Microservices GUI.
After making this change, the XAG-managed GoldenGate services started successfully without any issues.

Leave a comment