ASM Instance Termination After OCR Restore 19c/26ai

After a storage migration activity, a two-node Oracle RAC environment experienced Oracle Cluster Registry (OCR) corruption. Although OCR was restored successfully on a newly provisioned disk, ASM instances repeatedly terminated with internal errors. Further investigation revealed that the issue was not related to Oracle software or OCR restore procedures, but to an underlying disk visibility and sharing problem at the storage layer. This article walks through the symptoms, diagnostics, root cause, and final validation that confirmed the disk issue.

Environment Overview

  • Oracle RAC: 2-node cluster
  • ASM used for:
    • OCR/Voting diskgroup (OCRVOTE)
    • Database diskgroups
  • Recent activity:
    • Storage migration performed by system/storage team
    • OCR diskgroup became corrupted post-migration

ASM startup repeatedly failed with messages similar to:

Begin lmon rcfg omni enqueue reconfig stage6
End lmon rcfg omni enqueue reconfig stage6
Begin lmon rcfg omni enqueue reconfig stage7
End lmon rcfg omni enqueue reconfig stage7
Reconfiguration complete

Followed by internal ASM failures:

ORA-00600: internal error code, arguments: [kfcInitPba15]
ERROR: ORA-600 thrown in RBAL
RBAL: terminating the instance
ORA-1092: opitsk aborting process
Instance terminated by RBAL

Important Observations

  • ASM failed consistently during RBAL operations
  • Failures occurred only when using the new OCR/Voting disk
  • ASM startup never stabilized across both nodes

Validation Test That Changed Everything

To isolate the issue, we performed a controlled test:

Test Action

  • Restored OCR onto an existing, known-good ASM diskgroup (used by a database)
  • Ensured the diskgroup was:
    • Properly shared
    • Stable across both RAC nodes

Root Cause Summary

The newly provisioned OCR disk had storage-level issues, such as:

  • Disk not truly shared across both nodes
  • Inconsistent LUN presentation
  • Improper disk alignment or sector size
  • Underlying storage replication or fencing mismatch

As a result:

ASM terminated to protect cluster integrity

ASM metadata could not be initialized consistently

RBAL encountered fatal internal errors

This issue was not caused by Oracle RAC, ASM, or OCR restore procedures.
It was a pure storage-level problem introduced during migration.

Conclusion

The decisive proof came from restoring OCR onto a known-good diskgroup, which immediately stabilized ASM and the cluster.

Leave a comment

About Me

I’m Dhiraj Kumar, an Oracle RAC Database With over 15 years of experience, I’m passionate about building high-performance, scalable database solutions that support critical business operations.

📘 Check out my latest articles and insights on Medium (@dhirajengr) .