I have a Google alert running for Oracle RAC and besides pulling the regular vendor offerings, I receive a lot of alerts on how often Oracle RAC is being adopted into the enterprise. I mentioned commoditization in my previous article, but there are a lot of pressures and several odd and painful reminders that RAC needs a good administrator; and still things can go wrong and they can go very badly wrong. RAC is no longer exclusive for huge data- centers; it is being deployed in SMB environments as well. Since the need to administer the database will require in-house expertise, it is becoming increasingly important that we practice the installation and administration of our Oracle RAC on our VMware Server /and or ESX test bed."
So, let's pick up where we left off in our previous article on Clusterware administration.
Administering the OCR Using OCR Backup Files
We will take a quick look at two methods described for copying the Oracle Cluster Registry (OCR) and recovering it. Oracle Clusterware automatically creates OCR backups every four hours and it always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and then at the end of a week a complete backup for the week. So there is a robust backup taking place in the background. And you guessed it right; you cannot alter the backup frequencies. This is meant to protect you, the DBA, so that you can copy these generated backup files at least once daily to a different device from where the primary OCR resides. These files are located at %CRS_home/cdata/my_cluster.
Restoring the OCR from generated OCR Backups
Given that most of us run our Oracle RAC on limited hardware, on a VMware Server or ESX Server, it is no surprise to see applications failing. Always try to restart the application first. To verify the failure run an ocrcheck. The next step is to fix the problem.
On Unix/Linux Systems
Lets do the following to restore our OCR on Unix/Linux Systems.
- To show the backups, type the commands ocrconfig showbackup
Check the contents by doing
- Go to bin and stop the CRS. crs stop on all nodes.
- Perform the restore ocrconfig restore my_file
- Restart the nodes crs start
- We have spoken and seen the CVU (Cluster Verification Utility) play a crucial role during installation in our RAC on VMware Series. Check the OCRs integrity. Get a verbose output of all of the nodes by doing this: cluvfy comp ocr n all -verbose
On Windows Systems
Do the same as above. Check the OCR backups using the
ocrconfig -showbackupcommand. Verify the contents of the backup using
ocrdump -backupfilemy_file where my_name is your backup file.
Disable the OCR clients on all nodes by stopping the following
services from the Service Control Panel:
OracleCRService, and the
Restore the OCR backup file with the following command
-restoremfile name command. Always check to see if the OCR devices exist!
- Start all of the services. Restart all of the nodes to bring the cluster alive.
- To check the integrity, do the following with the CVU: cluvfy comp ocr -n all -verbose
Overruling the OCR (Oracle Cluster Registry) Data Loss Protection Machinery
Oracle Clusterware is robustly built and allows for minimal error. An overwrite can throw RAC out of balance. If your OCR cannot access its mirrored files and for some reason is not able to verify the location of the OCR files (It could be anything, a temporary bottleneck in your SAN Virtual Disks or local shared disks which you chose particularly for your OCR, in any case some temporary glitch), then your OCR prevents further modification to the available OCR. The data protection mechanism prohibits the Clusterware from starting on the node where you have your OCR; Oracle throws an error on your Enterprise Manager and Clusterware alert log files. If the problem persists in just one node, (all that information is displayed neatly in your Enterprise Manager and Clusterware log files errors--Error messages like CLSD-1009 or CLSD-1011), try to restart the node(s).
If that does not work and you cannot repair the OCR, then you are left with no other option except overriding the protection mechanism. Do not use it in the first instance! Oracle CRS is robust enough to check and poll the files appropriately. Be warned that data loss may occur (and here I mean that the OCR updates will be lost from the time of your last known successful update. So if you are attempting to make changes to configuration using the following command: ocrconfig overwrite, then the last good known configuration will be lost.
How to Override:
Check and compare the error message output with the Windows
registry OR ocr.loc on Unix/Linux. If they dont match then try to repair using
OCRDUMP(we will look more into OCRDUMP later in our Administration series) command to dump all information regarding the OCR configuration and check if the updates are latest.
If you cant resolve the error messages (CLSD) then do the
ocrconfig -overwriteto bring the node back to life.
We have taken a quick look at the Clusterware's administration. We also took at look at the override possibilities to force restore the OCR files when the OCR's built in protection mechanism prohibits the automatic restore of the same. Future articles will go into more hands-on training. I have read the Oracle manual several times and have quoted it often. I advise you to go through the manual more than once. There is different documentation on RAC, even books, but nothing comes close to the Oracle Documentation--and Oracle just made that freely available now! So go ahead , download the PDF books, get the free VMware Server (or Trial ESX 3.0 as its called Virtual Infrastructure 3) , ask your boss for an old server and go do some magic with VMware!