Protecting Oracle Instance with Local Clustering

Thursday Mar 13th 2003 by DatabaseJournal.com Staff
Share:

Follow along with Marin Komadina as he explains the ins and outs of Local Clustering with Sun Cluster software.

Marin Komadina

For a long time, corporations tried very hard to keep systems running under all conditions. For many e-commerce and business applications, database unavailability for an extended period leads to revenue loss. With a wide range of solutions in use (local disk mirroring, RAID, local clustering, remote disk mirroring, replication and local clustering with Oracle Parallel Server or Real Application Clusters) we need to choose the most optimal solution. One of those solutions is Local Clustering with Sun Cluster software.

This article covers:

  • Local Clustering Definition
  • HA (High Availability) Oracle Agent
  • Cluster Configuration
  • Procedure for Adding New Instance in Cluster
  • Conclusion

Local Clustering Definition

Local cluster is defined as two or more physical machines (nodes) that share common disk storage and logical IP address. Clustered nodes exchange cluster information over heartbeat link(s). Cluster software collects information and checks the situation on both nodes. On error condition, software will execute a predefined script and switch the clustered services over to a secondary machine. Oracle instance, as one of clustered services, will be switched off together with listener process, and restarted on the secondary (surviving) node.

HA Oracle Agent

HA Oracle Agent software controls Oracle database activity on Sun Cluster nodes. The agent performs fault checking using two processes on the local node and two process on the remote node by querying V$SYSSTAT table for active sessions. If the database has no active sessions, HA Agent will open a test transaction (connect and execute in serial create, insert, update, drop table commands). Return error codes from HA Agent have been validated against a special action file on location.

/etc/opt/SUNWscor/haoracle_config_V1:

# Action file for HA-DBMS Oracle fault monitor
# State DBMS_er proc_di log_msg timeout int_err new_sta action  message
---
co      *       *       *       *       1       *       
  stop    Internal HA-DBMS Oracle error connecting to db 
on      28      *       *       *       *       di      
  none    Session killed by DBA, will reconnect
*       50      *       *       *       *       di      
  takeover  O/S error occurred while obtaining an enqueue
co      0       *       *       1       0       *       
  restart A timeout has occured during connect
--

Takeover - cluster software will switch to another node.

Stop - cluster will stop DBMS

None - no action taken

Restart - database restarted locally on the same node

HA Oracle Agent requires Oracle configuration files (listener.ora, oratab and tnsnames.ora) on unique predefined location /var/opt/oracle.

Marin Komadina

Cluster Configuration

On a Sun Ultra-Enteprise-10000 machine, with 2 CPU, 100MHz, and 4096 Megabytes RAM, we have installed a Solaris 2.8 operating system and cluster software Sun Cluster 2.2.

One active Oracle database (artist000) running in failover configuration with HA Oracle Agent exists on that cluster.

Sun Cluster check

Cluster administration is usually under the control of the UNIX administrator. As a DBA, we have only to check if the cluster is running or not. If not, then it has to be started with scadmin command as root.

	$ haoracle list
Cluster is not up; Run scadmin {startcluster|startnode}first.
	scadmin startcluster CNODEB CLUST00C2

When the cluster is running, we can retrieve basic cluster information:

root@cnodea# haget -f mastered
CNODEA
CNODEB

Cluster configuration has two member nodes, CNODEA and CNODEB.

To find out more detailed information we can use command hastat:

cnodeb@root#hastat
	
Getting Information from all the nodes ......

HIGH AVAILABILITY CONFIGURATION AND STATUS 
              -------------------------------------------
 
LIST OF NODES CONFIGURED IN <CLUST00C2> CLUSTER
      CNODEA CNODEB
 
CURRENT MEMBERS OF THE CLUSTER
 
     CNODEA is a cluster member
     CNODEB is a cluster member
 
CONFIGURATION STATE OF THE CLUSTER
 
     Configuration State on CNODEB: Stable
     Configuration State on CNODEA: Stable
 
UPTIME OF NODES IN THE CLUSTER
 
     uptime of CNODEB:         
	 12:07pm  up 10 day(s),  5:14,  4 users,  load average: 1,10, 0,60, 0,42
     uptime of CNODEA:         
	 12:07pm  up 10 day(s),  5:14,  1 user,  load average: 0,10, 0,15, 0,17
 
LOGICAL HOSTS MASTERED BY THE CLUSTER MEMBERS
 
Logical Hosts Mastered on CNODEB: LOGNODEH
Logical Hosts for which CNODEB is Backup Node:      None
Logical Hosts Mastered on CNODEA:                           None
Logical Hosts for which CNODEA is Backup Node:  LOGNODEH 

LOGICAL HOSTS IN MAINTENANCE STATE
 
     None
 
STATUS OF PRIVATE NETS IN THE CLUSTER
 
     Status of Interconnects on CNODEB:
        interconnect0: selected
        interconnect1: up
     Status of private nets on CNODEB; 
        To CNODEB - UP
        To CNODEA - UP
	 
     Status of Interconnects on CNODEA:
        interconnect0: selected
        interconnect1: up
     Status of private nets on CNODEA:
        To CNODEB - UP
        To CNODEA - UP
	 
STATUS OF PUBLIC NETS IN THE CLUSTER
 
Status of Public Network On CNODEB :
 
bkggrp  r_adp   status  fo_time live_adp
nafo0   qfe0    OK      NEVER   qfe0
 
Status of Public Network On CNODEA :
 
bkggrp  r_adp   status  fo_time live_adp
nafo0   qfe0    OK      NEVER   qfe0
 
STATUS OF DATA SERVICES RUNNING IN THE CLUSTER
 
Status Of Registered Data Services
       oracle:                  On
	 
Status Of Data Services Running On CNODEB 
       Data Service "oracle":
       Database Status on CNODEB :
        artisit000 - running; 

 
Status Of Data Services Running On CNODEA 
       Data Service "oracle":
       Not being managed on this system

Physical node CNODEB is now mastering logical host LOGNODEH. The Oracle database instance artist000 is running in cluster CLUST00C2, on physical node CNODEB.

Volumes associated with the existing database:

/LOGNODEH
/oracle/app/oracle/admin/artist000
/oracle/app/oracle/admin/archspace/arch
/oracle/dataspace/artist000

/LOGNODEH - A special volume which holds cluster statistical information, and is usually few MB in size

/oracle/app/oracle/admin/artist000 - Volume group holds configuration and initialization database files

/oracle/app/oracle/admin/archspace/arch - Volume group holds database archived log files

/oracle/dataspace/artist000 - Volume group holds database files

HA Oracle agent check

To find out the actual configuration for Oracle services in a cluster, we can check the cluster configuration (hastat):

# hastat 
--
Status Of Data Services Running On CNODEB
       Data Service "oracle":
       Database Status on CNODEB:
        artist000 - running; 
--

or using HA Oracle agent (haoracle) command

#haoracle list
on:artist000:LOGNODEH:60:10:120:300:ha/test:/oracle/app/oracle/
admin/artist000/pfile/initartist000.ora:LISTENER_artist000

All actions made by HA Oracle Agent software are logged on location /var/opt/SUNWscor in file hadbms.log, where we can search for additional activity information.

Oracle Configuration Check

In a clustered system, Oracle installation can be on a shared volume or on a local disk.

On our test system, we have two nodes and Oracle binaries have been installed on each physical host local disk. Oracle HA Agent uses a special directory on location /var/opt/oracle, with Oracle initialisation and configuration files.

-rw-rw-r--   1 oracle   dba          439 Feb  6 10:52 oratab
-rwxr-xr-x   1 oracle   dba         4320 Feb  6 10:52 listener.ora*
-rw-r--r--   1 oracle   dba         1216 Feb  6 10:53 tnsnames.ora

Oratab is a special configuration file with entries for all of the database instances on all nodes running HA Oracle Agent software. The Oracle DBA must manually keep this file current on all nodes to successfully failover.

#more oratab
# Y/N Y0 enable dbstart and dbshut Scripts
artist000:/oracle/app/oracle/product/8.1.5:N

All entries in the oratab file should have the :N option specified to ensure that the instance will not start automatically on machine reboot.

Listener.ora

Listener configuration file entries define the communication protocol and settings for database connections.

LISTENER_artist000 =
  ( ADDRESS_LIST =
        ( ADDRESS = ( PROTOCOL = TCP ) ( Host = LOGNODEH)(Port= 1528))
        ( ADDRESS = ( PROTOCOL = IPC ) ( KEY = artist000 ) ))
SID_LIST_LISTENER_artist000 =
  (SID_LIST = (SID_DESC = 
(ORACLE_HOME = /oracle/app/oracle/product/8.1.5)(SID_NAME = artist000)))

Tnsnames.ora

This is the Oracle network connection configuration file, with defined "service names" for every instance and logical host defined in listener.ora file.

artist000 =
   ( DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = LOGNODEH)(PORT = 1528))
    (CONNECT_DATA = (SID = artist000)))
Marin Komadina

Procedure for Adding New Instance in Cluster

Since we have sufficient memory and CPU resources free, we can add a new instance. The new project has a special request for database version 8.1.7. Oracle software has been installed locally on every physical host in the new ORACLE_HOME directory. New disk space has been added to the system and new mounting points created.

/LOGNODEH
/oracle/app/oracle/admin/artist000
/oracle/app/oracle/admin/artist000/arch
/oracle/dataspace/artist000
/LOGNODEI
/oracle/app/oracle/admin/artist001
/oracle/app/oracle/admin/artist001/arch
/oracle/dataspace/artist001

The new logical host name is LOGNODEI for instance artist001. The database is up and running with an activated listener.

Oracle configuration files update

Make new SID entries in the Oracle HA Agent configuration files for the new instance.

oratab

artist001:/oracle/app/oracle/product/8.1.7:N

Listener.ora

LISTENER_artist001 =                        
  ( ADDRESS_LIST =
        ( ADDRESS = ( PROTOCOL = TCP ) ( Host = LOGNODEI)(Port= 1534))
        ( ADDRESS = ( PROTOCOL = IPC ) ( KEY = artist001 ) ))
 	SID_LIST_LISTENER_artist001 =
  (SID_LIST=(SID_DESC=
         (ORACLE_HOME = /oracle/app/oracle/product/8.1.6)(SID_NAME = artist001)))

Tnsnames.ora

artist001 =(DESCRIPTION= (ADDRESS = (PROTOCOL = TCP)(HOST = LOGNODEI)(PORT = 1534))
    (CONNECT_DATA = (SID = artist001)))  

Database user for fault monitoring

The Oracle HA Agent uses a special oracle account to check database availability. The special user account in the Oracle database has to be created before starting fault monitoring.

SQL> CREATE USER HA IDENTIFIED BY test  DEFAULT TABLESPACE SYSTEM
TEMPORARY TABLESPACE SYSTEM  QUOTA 1M ON SYSTEM  PROFILE "DEFAULT"  ACCOUNT UNLOCK;
User created.
SQL> GRANT CONNECT, REOURCE TO HA;
Grant succeed.
SQL> ALTER USER HA DEFAULT ROLE CONNECT,RESOURCE;
User altered. 
SQL> GRANT CREATE SESSION TO HA;
Grant succeed.
SQL> GRANT CREATE TABLE TO HA;
Grant succeed.
SQL> GRANT UNLIMITED TABLESPACE TO HA;
Grant succeed.
SQL> GRANT SELECT ON SYS.V_$SYSSTAT TO HA;
Grant succeed.

Oracle HA agent reconfiguration

 

To protect the new instance with a cluster we need to make an instance registration and activation against the cluster software. Registration and activation must be taken from the mastered cluster node, (the node from which we have started the cluster service).

As a Unix super account (root) user, run the hareg command:

#hareg -s -r oracle -h LOGNODEI 
#hareg -y oracle

The new instance has to be added to the Oracle HA agent configuration data:

 
#haoracle insert artist001 LOGNODEI 60 10 120 300 ha/test /oracle/app/
oracle/admin/artist001/pfile/initartist001.ora LISTENER_artist001

Parameters for haoracle command:

artist001 -instance name

LOGNODEI - Logical host for instance artist001

60 10 120 300 - connectivity probe paramaters (cycle time - 60 seconds, cycle count - 10,probe time out - 120 seconds, restart delay - 300 seconds)

ha/test - fault monitoring database account

/oracle/app/oracle/admin/artist001/pfile/initartist001.ora database initialization file

LISTENER_artist001 - oracle listener for new instance

Database instance artist001 is ready for active monitoring with the Oracle HA agent. It is included in the configuration files but not started.

#haoracle list
off:artist001:LOGNODEI:60:10:120:300:ha/test:/oracle/app/oracle/
admin/artist001/pfile/initartist001.ora:LISTENER_artist001
on:artist000:LOGNODEH:60:10:120:300:ha/test:/oracle/app/oracle/
admin/artist000/pfile/initartist000t.ora:LISTENER_artist000t

Starting cluster monitoring for new instance:

#haoracle start artist001
#haoracle list
on:artist001:LOGNODEI:60:10:120:300:ha/test:/oracle/app/
oracle/admin/artist001/pfile/initartist001.ora:LISTENER_artist001
on:artist000:LOGNODEH:60:10:120:300:ha/test:/oracle/app/
oracle/admin/artist000/pfile/initartist000.ora:LISTENER_artist000

In moment of starting, cluster agent is going to check the status of the database and listener process. If they are not running cluster agent will start them.

Testing instance failover

cnodea@root#scadmin switch   FCLUST00C2      CNODEB        LOGNODEI
                             clustname         dest-host       logical-hosts ...

After we run a command "scadmin switch" we will start the cluster failover process. The process is now transparent and fully automatic. We can manually switch all, or only one database to another host. Additionally, in the cluster log file we can monitor the database status and actions taken from the Oracle HA Agent.

Host CNODEA

Giving up logical host LOGNODEI
fm_stop method of data service oracle completed successfully.
LOGNODEI:artist001: starting shutdown immediate for oracle instance artist001
LOGNODEI:artist001: Shutdown immediate for oracle instance artist001 completed
stop_net method of data service oracle completed successfully.
stop method of data service oracle completed successfully.
umount of /oracle/oradata1/artist001 succeeded
umount of /oracle/app/oracle/admin/artist001/arch succeeded
umount of /oracle/app/oracle/admin/artist001 succeeded
umount of /LOGNODEI succeeded
deporting aartist_stl_dg
Give up of logical host LOGNODEI succeeded

Host CNODEB

Taking over logical host LOGNODEI
importing artist_stl_dg
start method of data service oracle completed successfully.
LOGNODEI:artist001: starting up Oracle Listener
start_net method of data service oracle completed successfully.
LOGNODEI:artist001: Starting up instance artist001, PFILE=initartist001.ora
LOGNODEI:artist001: Startup for instance artist001 succeed. 
fm_init method of data service oracle completed successfully.
Take over of logical host LOGNODEI succeeded

Please acknowledge that Oracle HA agent will make shutdown immediate. This behavior is controlled by /etc/opt/SUNWscor/haoracle_config_V1 configuration file.

Second way to initiate instance failover is to use the haswitch command.

 

Checking Oracle connections in database artist001:

SQL> SELECT S.SID,S.STATUS,P.SPID,S.SERIAL#,S.USERNAME,S.OSUSER,S.TERMINAL 
FROM V$SESSION S, V$PROCESS P WHERE  S.paddr = P.addr and S.USERNAME='SYSHA' ORDER BY 1,2

  SID STATUS   SPID     SERIAL#   USERNAME   OSUSER    TERMINAL                  
----- -------- -----    -------- ----------  --------- ----------
   63 INACTIVE 344       9148     HA         root       console
   82 INACTIVE 29484     24809               daemon     console

The database has two connections, local daemon coming from HA Oracle Agent and remote connection coming from other cluster node as user HA. These two connections exist all the time while the Oracle HA Agent is running against the selected database.

Conclusion

Clustering, for a long time, has been a good solution for local protection of our sensible applications. On the other hand it is very expensive and not technically perfect:

  • any physical (block) corruption will crash our application
  • any logical (dropping of a database object) corruption will crash our application
  • client-server connections will not survive cluster failover
  • uncontrolled instance failover caused by network timeouts or low system resources during working time
  • unpleasant "shutdown abort"

People understand the problem and try to exploit existing configurations to the maximum. Often, they run separate databases on both nodes to obtain the maximum use from hardware or use a second node for offloading the primary node from non-critical processing.

Trying to provide better and more reliable service for customers while cutting hardware costs at the same time, may push companies in a new direction--Real Application Cluster. Real Application cluster provides more availability and performance for less money.

» See All Articles by Columnist Marin Komadina

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved