Oracle GoldenGate Microservices Failing to Start via XAG

A few days back, I set up Oracle GoldenGate Microservices on:

  • Version: 21.12.0.0.0
  • Source & Destination: Configured for one of the critical databases

This database is vital for real-time data synchronization, and Oracle GoldenGate is the preferred solution for such requirements. It’s a very popular product nowadays for data replication and synchronization across environments.

Personally, I appreciate and highly value Oracle GoldenGate for its robustness and capabilities in handling critical data replication scenarios.

Later, I decided to set up XAG (Oracle Grid Infrastructure Agent) to automate the failover of GoldenGate from Node 1 to another node without any human intervention. This was aimed at ensuring high availability of the GoldenGate services in case of any node failure.

I successfully added the configuration using the agctl utility and registered the replicat and extract processes without any issues. The GoldenGate services also started via XAG seamlessly.

Additionally, I tested the failover functionality, and it worked perfectly without any problems — the services failed over to the standby node as expected.

SHowever, suddenly after a few days, I noticed that I was unable to start the GoldenGate services via XAG, as it was throwing an error during startup.

Interestingly, when I started the GoldenGate services manually, they worked perfectly fine without any issues.

This indicated that the problem was specific to XAG’s integration with GoldenGate, not with GoldenGate itself.

I found the below XAG process in offline mode:

xag.dbtest
1        ONLINE  OFFLINE. STABLE◀◀◀◀◀◀◀
xag.dbtest.goldengate
1        ONLINE  OFFLINE STABLE◀◀◀◀◀◀◀

Then i searched in the XAGTASK.log and found the below :

2025-07-10 09:06:48.584 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [start] XAG HealthCheck after start returned 3
2025-07-10 09:06:48.684 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [start] execute XAGTask HealthCheck
2025-07-10 09:06:51.088 :   AGENT:1086715648: [     NONE] {1:25643:56777} {1:25643:56777} Created alert : (:CRSAGF00113:) :  Aborting the command: start for resource: xag.dbtest.goldengate 1 1
2025-07-10 09:06:51.088 :CLSDYNAM:1086715648: [xag.dbtest.goldengate]{1:25643:56777} [start] Killing action script: start
2025-07-10 09:06:51.088 :    AGFW:1086715648: [    ERROR] {1:25643:56777} Command: start for resource: xag.dbtest.goldengate 1 1 has been aborted, updating resource state/label
2025-07-10 09:06:51.088 :    AGFW:1084614400: [     INFO] {1:25643:56777} Processing aborted reply : start for resource: xag.dbtest.goldengate 1 1
2025-07-10 09:06:51.088 :    AGFW:1084614400: [     INFO] {1:25643:56777} Agent sending reply for: RESOURCE_START[xag.dbtest.goldengate 1 1] ID 4098:1393127
2025-07-10 09:06:51.089 :    AGFW:2343531454: [     INFO] {1:25643:56777} Aborted cmd entrypoint returned : start for resource: xag.dbtest.goldengate 1 1
2025-07-10 09:06:51.089 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] Executing action script: <GRID_HOME>/bin/aggoldengatescaas[check]
2025-07-10 09:06:51.244 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] GG agent running command 'check' on xag.dbtest.goldengate
2025-07-10 09:06:51.395 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] execute XAGTask HealthCheck
2025-07-10 09:06:53.098 :CLSDYNAM:2343531454: [xag.dbtest.goldengate]{1:25643:56777} [check] XAGTask retcode  3
2025-07-10 09:06:53.099 :    AGFW:1084614400: [     INFO] {1:25643:56777} xag.dbtest.goldengate 1 1 state changed from: STARTING to: FAILED◀◀◀◀◀◀◀


 Based on the analysis of the crsd_scriptagent_oracle.trc trace file, I found the following error related to the XAG health check:

[xag.dbtest.goldengate]{1:25643:56777} [start]  HealthCheck failed
[xag.dbtest.goldengate]{1:25643:56777} [start]  Exiting with return code 3
[xag.dbtest.goldengate]{1:25643:56777} [start] XAG HealthCheck after start returned 3
[xag.dbtest.goldengate]{1:25643:56777} [start] execute XAGTask HealthCheck

On further analysis i got the below oracle doc:

https://docs.oracle.com/en/database/oracle/oracle-database/19/haovw/ogg-microservices-premises1.html#GUID-13194A0B-BE42-47D0-85D0-86FFC94130E8

In Oracle GoldenGate deployments using Microservices Architecture, the “Critical to deployment health” flag on Extract or Replicat processes is generally not recommended. Setting this flag can cause a single Extract or Replicat failure to bring down the entire GoldenGate deployment, and it can also prevent XAG (the GoldenGate Autostart/Autorestart feature) from restarting the processes

To resolve this, I disabled the “Critical to Deployment Health” flag from the GoldenGate Microservices GUI.

After making this change, the XAG-managed GoldenGate services started successfully without any issues.

Leave a comment

About Me

I’m Dhiraj Kumar, an Oracle RAC Database With over 15 years of experience, I’m passionate about building high-performance, scalable database solutions that support critical business operations.

📘 Check out my latest articles and insights on Medium (@dhirajengr) .