Oracle RAC Administration - Part 4: Administering the Clusterware: Components

Thursday Sep 7th 2006 by Tarry Singh

Part 4 of this series covers administering and restoring the OCR using OCR Backups and overruling the OCR Data Loss Protection Machinery.

Brief intro

I have a Google alert running for Oracle RAC and besides pulling the regular vendor offerings, I receive a lot of alerts on how often Oracle RAC is being adopted into the enterprise. I mentioned commoditization in my previous article, but there are a lot of pressures and several odd and painful reminders that RAC needs a good administrator; and still things can go wrong and they can go very badly wrong. RAC is no longer exclusive for huge data- centers; it is being deployed in SMB environments as well. Since the need to administer the database will require in-house expertise, it is becoming increasingly important that we practice the installation and administration of our Oracle RAC on our VMware Server /and or ESX test bed."

So, let's pick up where we left off in our previous article on Clusterware administration.

Administering the OCR Using OCR Backup Files

We will take a quick look at two methods described for copying the Oracle Cluster Registry (OCR) and recovering it. Oracle Clusterware automatically creates OCR backups every four hours and it always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and then at the end of a week a complete backup for the week. So there is a robust backup taking place in the background. And you guessed it right; you cannot alter the backup frequencies. This is meant to protect you, the DBA, so that you can copy these generated backup files at least once daily to a different device from where the primary OCR resides. These files are located at %CRS_home/cdata/my_cluster.

Restoring the OCR from generated OCR Backups

Given that most of us run our Oracle RAC on limited hardware, on a VMware Server or ESX Server, it is no surprise to see applications failing. Always try to restart the application first. To verify the failure run an ocrcheck. The next step is to fix the problem.

On Unix/Linux Systems

Lets do the following to restore our OCR on Unix/Linux Systems.

  • To show the backups, type the commands ocrconfig –showbackup
  • Check the contents by doing ocrdump -backupfile my_file
  • Go to bin and stop the CRS. crs stop on all nodes.
  • Perform the restore ocrconfig –restore my_file
  • Restart the nodes crs start
  • We have spoken and seen the CVU (Cluster Verification Utility) play a crucial role during installation in our RAC on VMware Series. Check the OCR’s integrity. Get a verbose output of all of the nodes by doing this: cluvfy comp ocr –n all -verbose

On Windows Systems

  • Do the same as above. Check the OCR backups using the ocrconfig -showbackup command. Verify the contents of the backup using ocrdump -backupfile my_file where my_name is your backup file.
  • Disable the OCR clients on all nodes by stopping the following services from the Service Control Panel: OracleClusterVolumeService, OracleCSService, OracleCRService, and the OracleEVMService.
  • Restore the OCR backup file with the following command ocrconfig -restore mfile name command. Always check to see if the OCR devices exist!
  • Start all of the services. Restart all of the nodes to bring the cluster alive.
  • To check the integrity, do the following with the CVU: cluvfy comp ocr -n all -verbose

Overruling the OCR (Oracle Cluster Registry) Data Loss Protection Machinery

Oracle Clusterware is robustly built and allows for minimal error. An overwrite can throw RAC out of balance. If your OCR cannot access its mirrored files and for some reason is not able to verify the location of the OCR files (It could be anything, a temporary bottleneck in your SAN Virtual Disks or local shared disks which you chose particularly for your OCR, in any case some temporary glitch), then your OCR prevents further modification to the available OCR. The data protection mechanism prohibits the Clusterware from starting on the node where you have your OCR; Oracle throws an error on your Enterprise Manager and Clusterware alert log files. If the problem persists in just one node, (all that information is displayed neatly in your Enterprise Manager and Clusterware log files errors--Error messages like CLSD-1009 or CLSD-1011), try to restart the node(s).

If that does not work and you cannot repair the OCR, then you are left with no other option except overriding the protection mechanism. Do not use it in the first instance! Oracle CRS is robust enough to check and poll the files appropriately. Be warned that data loss may occur (and here I mean that the OCR updates will be lost from the time of your last known successful update. So if you are attempting to make changes to configuration using the following command: ocrconfig –overwrite, then the last good known configuration will be lost.

How to Override:

  • Check and compare the error message output with the Windows registry OR ocr.loc on Unix/Linux. If they don’t match then try to repair using ocrconfig –repair.
  • Use OCRDUMP (we will look more into OCRDUMP later in our Administration series) command to dump all information regarding the OCR configuration and check if the updates are latest.
  • If you can’t resolve the error messages (CLSD) then do the following: ocrconfig -overwrite to bring the node back to life.


We have taken a quick look at the Clusterware's administration. We also took at look at the override possibilities to force restore the OCR files when the OCR's built in protection mechanism prohibits the automatic restore of the same. Future articles will go into more hands-on training. I have read the Oracle manual several times and have quoted it often. I advise you to go through the manual more than once. There is different documentation on RAC, even books, but nothing comes close to the Oracle Documentation--and Oracle just made that freely available now! So go ahead , download the PDF books, get the free VMware Server (or Trial ESX 3.0 as its called Virtual Infrastructure 3) , ask your boss for an old server and go do some magic with VMware!

