Oracle RAC Administration - Part 9: Hands on administration

Thursday Nov 16th 2006 by Tarry Singh
Share:

Tarry Singh takes a closer look at installation errors that aren't really errors, ESX host tuning for time synchronization and some SRVCTL commands.

Brief intro

As a continuing part of the hands-on articles, we will take a deeper look at such things as errors in the installation (which are actually not errors), ESX host tuning for time synchronization (without which the whole RHEL RAC installation means nothing) and some SRVCTL commands.

Errors come and errors go

Let's take a look at a couple of screen shots and some of the typical Cluster Ready Services (CRS) errors and tricks to bring your RAC services and applications online.

This is a typical placement error that I get on every installation. I think it may have to do with the time issue; we will come to that. If you are using an ESX server to test/develop your RAC, then the information to test and fix your time synchronization issues will certainly come very handy.

It really doesn't mean a thing. I do run into this every time I do my virtual machine restart.

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    ONLINE    UNKNOWN   vm1rh4
ora....H4.lsnr application    ONLINE    UNKNOWN   vm1rh4
ora.vm1rh4.gsd application    ONLINE    UNKNOWN   vm1rh4
ora.vm1rh4.ons application    ONLINE    UNKNOWN   vm1rh4
ora.vm1rh4.vip application    ONLINE    ONLINE    vm1rh4
ora....SM2.asm application    ONLINE    UNKNOWN   vm2rh4
ora....H4.lsnr application    ONLINE    UNKNOWN   vm2rh4
ora.vm2rh4.gsd application    ONLINE    UNKNOWN   vm2rh4
ora.vm2rh4.ons application    ONLINE    UNKNOWN   vm2rh4
ora.vm2rh4.vip application    ONLINE    ONLINE    vm2rh4
ora....SM3.asm application    ONLINE    OFFLINE
ora....H4.lsnr application    ONLINE    OFFLINE
ora.vm3rh4.gsd application    ONLINE    UNKNOWN   vm3rh4
ora.vm3rh4.ons application    ONLINE    UNKNOWN   vm3rh4
ora.vm3rh4.vip application    ONLINE    UNKNOWN   vm1rh4
ora....SM4.asm application    ONLINE    OFFLINE
ora....H4.lsnr application    ONLINE    OFFLINE
ora.vm4rh4.gsd application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.ons application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.vip application    ONLINE    UNKNOWN   vm4rh4

This is really a VM issue. I am using a 1.2 G vMEM, 2vCPUs, and decent SCSI Virtual Disks (VMFS-2 file system on a VMDK format). Restarting the services does not go as expected. Let's see what happens if you try to do a crs_stat –stop and then a crs_stop –start.

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop -all
Attempting to stop `ora.vm4rh4.gsd` on member `vm4rh4`
Attempting to stop `ora.vm4rh4.ons` on member `vm4rh4`
Stop of `ora.vm4rh4.gsd` on member `vm4rh4` succeeded.
Stop of `ora.vm4rh4.ons` on member `vm4rh4` succeeded.
Attempting to stop `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4`
Attempting to stop `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4`
Stop of `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4` succeeded.
Attempting to stop `ora.vm2rh4.ASM2.asm` on member `vm2rh4`
Stop of `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4` succeeded.
Attempting to stop `ora.vm1rh4.ASM1.asm` on member `vm1rh4`
Stop of `ora.vm2rh4.ASM2.asm` on member `vm2rh4` succeeded.
Attempting to stop `ora.vm2rh4.vip` on member `vm2rh4`
Stop of `ora.vm2rh4.vip` on member `vm2rh4` succeeded.
Stop of `ora.vm1rh4.ASM1.asm` on member `vm1rh4` succeeded.
Attempting to stop `ora.vm1rh4.vip` on member `vm1rh4`
Stop of `ora.vm1rh4.vip` on member `vm1rh4` succeeded.

As you can see, it just does not restart all of the services when we do the crs_stat –t.

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE
ora....H4.lsnr application    OFFLINE   OFFLINE
ora.vm1rh4.gsd application    ONLINE    UNKNOWN   vm1rh4
ora.vm1rh4.ons application    ONLINE    UNKNOWN   vm1rh4
ora.vm1rh4.vip application    OFFLINE   OFFLINE
ora....SM2.asm application    OFFLINE   OFFLINE
ora....H4.lsnr application    OFFLINE   OFFLINE
ora.vm2rh4.gsd application    ONLINE    UNKNOWN   vm2rh4
ora.vm2rh4.ons application    ONLINE    UNKNOWN   vm2rh4
ora.vm2rh4.vip application    OFFLINE   OFFLINE
ora....SM3.asm application    ONLINE    OFFLINE
ora....H4.lsnr application    ONLINE    OFFLINE
ora.vm3rh4.gsd application    ONLINE    UNKNOWN   vm3rh4
ora.vm3rh4.ons application    ONLINE    UNKNOWN   vm3rh4
ora.vm3rh4.vip application    ONLINE    UNKNOWN   vm1rh4
ora....SM4.asm application    ONLINE    OFFLINE
ora....H4.lsnr application    ONLINE    OFFLINE
ora.vm4rh4.gsd application    OFFLINE   OFFLINE
ora.vm4rh4.ons application    OFFLINE   OFFLINE
ora.vm4rh4.vip application    ONLINE    UNKNOWN   vm4rh4

And this is bizarre , of course, I will test it on a ESX 3.0 with more capacity and see if it vanishes, but now the task is to start all the services one by one, which is not easy when you have “ora....H4.lsnr” names. So, when you do the following, you get full names of the services.

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat

In my case, it gave these and now you have the full names.

NAME=ora.brianic.brianic1.inst
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.brianic.brianic2.inst
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.brianic.brianic3.inst
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.brianic.brianic4.inst
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.brianic.db
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.brianic.fokeserv.brianic1.srv
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm1rh4
NAME=ora.brianic.fokeserv.brianic2.srv
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm2rh4
NAME=ora.brianic.fokeserv.brianic3.srv
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm3rh4
NAME=ora.brianic.fokeserv.brianic4.srv
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm4rh4
NAME=ora.brianic.fokeserv.cs
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm4rh4
NAME=ora.vm1rh4.ASM1.asm
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm1rh4.LISTENER_VM1RH4.lsnr
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm1rh4.gsd
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm1rh4
NAME=ora.vm1rh4.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm1rh4
NAME=ora.vm1rh4.vip
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm2rh4.ASM2.asm
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm2rh4.LISTENER_VM2RH4.lsnr
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm2rh4.gsd
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm2rh4
NAME=ora.vm2rh4.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm2rh4
NAME=ora.vm2rh4.vip
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm3rh4.ASM3.asm
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm3rh4.LISTENER_VM3RH4.lsnr
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm3rh4.gsd
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm3rh4
NAME=ora.vm3rh4.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm3rh4
NAME=ora.vm3rh4.vip
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm4rh4.ASM4.asm
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm4rh4.LISTENER_VM4RH4.lsnr
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
NAME=ora.vm4rh4.gsd
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm4rh4
NAME=ora.vm4rh4.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on vm4rh4
NAME=ora.vm4rh4.vip
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

So I go ahead and stop them all first...

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm1rh4.gsd
Attempting to stop `ora.vm1rh4.gsd` on member `vm1rh4`
Stop of `ora.vm1rh4.gsd` on member `vm1rh4` succeeded.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm2rh4.gsd
Attempting to stop `ora.vm2rh4.gsd` on member `vm2rh4`
Stop of `ora.vm2rh4.gsd` on member `vm2rh4` succeeded.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm3rh4.gsd
Attempting to stop `ora.vm3rh4.gsd` on member `vm3rh4`
Stop of `ora.vm3rh4.gsd` on member `vm3rh4` succeeded.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm3rh4.ons
Attempting to stop `ora.vm3rh4.ons` on member `vm3rh4`
Stop of `ora.vm3rh4.ons` on member `vm3rh4` succeeded.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm2rh4.ons
Attempting to stop `ora.vm2rh4.ons` on member `vm2rh4`
Stop of `ora.vm2rh4.ons` on member `vm2rh4` succeeded.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm1rh4.ons
Attempting to stop `ora.vm1rh4.ons` on member `vm1rh4`
Stop of `ora.vm1rh4.ons` on member `vm1rh4` succeeded.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm3rh4.vip
Attempting to stop `ora.vm3rh4.vip` on member `vm1rh4`
Stop of `ora.vm3rh4.vip` on member `vm1rh4` succeeded.
CRS-1016: Resources depending on 'ora.vm3rh4.vip' are running
Check status
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE
ora....H4.lsnr application    OFFLINE   OFFLINE
ora.vm1rh4.gsd application    OFFLINE   OFFLINE
ora.vm1rh4.ons application    OFFLINE   OFFLINE
ora.vm1rh4.vip application    OFFLINE   OFFLINE
ora....SM2.asm application    OFFLINE   OFFLINE
ora....H4.lsnr application    OFFLINE   OFFLINE
ora.vm2rh4.gsd application    OFFLINE   OFFLINE
ora.vm2rh4.ons application    OFFLINE   OFFLINE
ora.vm2rh4.vip application    OFFLINE   OFFLINE
ora....SM3.asm application    OFFLINE   OFFLINE
ora....H4.lsnr application    OFFLINE   OFFLINE
ora.vm3rh4.gsd application    OFFLINE   OFFLINE
ora.vm3rh4.ons application    OFFLINE   OFFLINE
ora.vm3rh4.vip application    OFFLINE   OFFLINE
ora....SM4.asm application    OFFLINE   OFFLINE
ora....H4.lsnr application    OFFLINE   OFFLINE
ora.vm4rh4.gsd application    OFFLINE   OFFLINE
ora.vm4rh4.ons application    OFFLINE   OFFLINE
ora.vm4rh4.vip application    OFFLINE   OFFLINE

Then (fortunately) I just need the –all command to start all services.

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_start -all
Attempting to start `ora.vm1rh4.vip` on member `vm1rh4`
Attempting to start `ora.vm2rh4.vip` on member `vm2rh4`
Attempting to start `ora.vm3rh4.vip` on member `vm3rh4`
Attempting to start `ora.vm4rh4.vip` on member `vm4rh4`
Start of `ora.vm2rh4.vip` on member `vm2rh4` succeeded.
Attempting to start `ora.vm2rh4.ASM2.asm` on member `vm2rh4`
Start of `ora.vm4rh4.vip` on member `vm4rh4` succeeded.
Attempting to start `ora.vm4rh4.ASM4.asm` on member `vm4rh4`
Start of `ora.vm3rh4.vip` on member `vm3rh4` succeeded.
Start of `ora.vm1rh4.vip` on member `vm1rh4` succeeded.
Attempting to start `ora.vm1rh4.ASM1.asm` on member `vm1rh4`
Attempting to start `ora.vm3rh4.ASM3.asm` on member `vm3rh4`
Start of `ora.vm2rh4.ASM2.asm` on member `vm2rh4` succeeded.
Attempting to start `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4`
Start of `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4` succeeded.
Start of `ora.vm1rh4.ASM1.asm` on member `vm1rh4` succeeded.
Attempting to start `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4`
Start of `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4` succeeded.
Start of `ora.vm3rh4.ASM3.asm` on member `vm3rh4` succeeded.
Attempting to start `ora.vm3rh4.LISTENER_VM3RH4.lsnr` on member `vm3rh4`
Start of `ora.vm4rh4.ASM4.asm` on member `vm4rh4` succeeded.
Start of `ora.vm3rh4.LISTENER_VM3RH4.lsnr` on member `vm3rh4` succeeded.
Attempting to start `ora.vm4rh4.LISTENER_VM4RH4.lsnr` on member `vm4rh4`
Start of `ora.vm4rh4.LISTENER_VM4RH4.lsnr` on member `vm4rh4` succeeded.
CRS-1002: Resource 'ora.vm1rh4.ons' is already running on member 'vm1rh4'
CRS-1002: Resource 'ora.vm2rh4.ons' is already running on member 'vm2rh4'
Attempting to start `ora.vm1rh4.gsd` on member `vm1rh4`
CRS-1002: Resource 'ora.vm3rh4.ons' is already running on member 'vm3rh4'
CRS-1002: Resource 'ora.vm4rh4.ons' is already running on member 'vm4rh4'

These errors don't mean anything; the installation ran faster than the console command--the services were already started, causing the error messages. As you see now...

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    vm1rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm1rh4
ora.vm1rh4.gsd application    ONLINE    ONLINE    vm1rh4
ora.vm1rh4.ons application    ONLINE    ONLINE    vm1rh4
ora.vm1rh4.vip application    ONLINE    ONLINE    vm1rh4
ora....SM2.asm application    ONLINE    ONLINE    vm2rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm2rh4
ora.vm2rh4.gsd application    ONLINE    ONLINE    vm2rh4
ora.vm2rh4.ons application    ONLINE    ONLINE    vm2rh4
ora.vm2rh4.vip application    ONLINE    ONLINE    vm2rh4
ora....SM3.asm application    ONLINE    ONLINE    vm3rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm3rh4
ora.vm3rh4.gsd application    ONLINE    ONLINE    vm3rh4
ora.vm3rh4.ons application    ONLINE    ONLINE    vm3rh4
ora.vm3rh4.vip application    ONLINE    ONLINE    vm3rh4
ora....SM4.asm application    ONLINE    ONLINE    vm4rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.gsd application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.ons application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.vip application    ONLINE    ONLINE    vm4rh4
[oracle@vm1rh4 ~]$

After having successfully completed the 4-node installation:

Click for larger image

We print out all of our CRS services:

[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....c1.inst application    ONLINE    ONLINE    vm1rh4
ora....c2.inst application    ONLINE    ONLINE    vm2rh4
ora....c3.inst application    ONLINE    ONLINE    vm3rh4
ora....c4.inst application    ONLINE    ONLINE    vm4rh4
ora.brianic.db application    ONLINE    ONLINE    vm1rh4
ora....ic1.srv application    ONLINE    ONLINE    vm1rh4
ora....ic2.srv application    ONLINE    ONLINE    vm2rh4
ora....ic3.srv application    ONLINE    ONLINE    vm3rh4
ora....ic4.srv application    ONLINE    ONLINE    vm4rh4
ora....serv.cs application    ONLINE    ONLINE    vm4rh4
ora....SM1.asm application    ONLINE    ONLINE    vm1rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm1rh4
ora.vm1rh4.gsd application    ONLINE    ONLINE    vm1rh4
ora.vm1rh4.ons application    ONLINE    ONLINE    vm1rh4
ora.vm1rh4.vip application    ONLINE    ONLINE    vm1rh4
ora....SM2.asm application    ONLINE    ONLINE    vm2rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm2rh4
ora.vm2rh4.gsd application    ONLINE    ONLINE    vm2rh4
ora.vm2rh4.ons application    ONLINE    ONLINE    vm2rh4
ora.vm2rh4.vip application    ONLINE    ONLINE    vm2rh4
ora....SM3.asm application    ONLINE    ONLINE    vm3rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm3rh4
ora.vm3rh4.gsd application    ONLINE    ONLINE    vm3rh4
ora.vm3rh4.ons application    ONLINE    ONLINE    vm3rh4
ora.vm3rh4.vip application    ONLINE    ONLINE    vm3rh4
ora....SM4.asm application    ONLINE    ONLINE    vm4rh4
ora....H4.lsnr application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.gsd application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.ons application    ONLINE    ONLINE    vm4rh4
ora.vm4rh4.vip application    ONLINE    ONLINE    vm4rh4
[oracle@vm1rh4 ~]$

As you can see, this is a very real challenge in the VMware environment. It is not yet ripe to be deployed in production--not because of the problems we encountered here, but also issues with OSs like RHEL on time synchronization! Let's take a look at what I did to resolve these issues.

Fixing the Time Synchronization issue on VMware ESX Server host for RHEL/Centos 4.2

The first major steps are these:

  • Editing the following files for ESX 2.x Servers

    • /etc/ntp.conf
    • /etc/ntp/step-tickers
    • [root@esxhost]# esxcfg-firewall --enableService ntpClient

  • For ESX Server 3.0 only, run the following command. This opens the appropriate ports and enables the NTP daemon to talk with the external server.
    [root@esxhost]# esxcfg-firewall --enableService ntpClient

  • Restarting your ntp daemon, service ntpd restart.

  • Disabling the VMware tools in guests

  • Installing the ntp daemon as a service chkconfig --level 345 ntpd on

  • Set your local hardware clock to NTP server by doing : hwclock –systohc

Editing ntp.conf

In your ESX files, after making backups of the ntp.conf files, they should look like this:

restrict default kod nomodify notrap
server 0.pool.ntp.org
server 1.pool.ntp.org
server 2.pool.ntp.org
driftfile /etc/ntp/drift

Editing step-tickers

Here the listed servers should be your known NTP servers. Then your step-tickers file looks like this:


0.pool.ntp.org
1.pool.ntp.org
2.pool.ntp.org
pool.ntp.org

And finally checking by running ntpq –p to get detailed realtime check on the NTP activities. And you are done.

Conclusion:

In the next article, we will continue administering our ASM, making a new service and trying to disable and enable a particular instance in order to perform, say, an OS patch work or any regular maintenance.

» See All Articles by Columnist Tarry Singh

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved