Tips for using Tivoli Storage Manager with DB2

Wednesday Apr 28th 2004 by Marin Komadina
Share:

If you're having problems doing backups with Tivoli Storage Manager, review these tips to find potential solutions.

As a final word about Tivoli Storage Manager (TSM) database backup, here are a few tips I have found working with the TSM system. Only the most interesting tips are included since presenting all of them would be beyond the scope of this article.

This article covers:

  • Tips Using TSM
  • Conclusion

Tips Using TSM

Tip 1.User Exit is archiving log files more than once

An unusual and confusing DB2 feature is the existence of duplicated archived log files on the TSM server. One theory is that once a database log file is full a database manager is creating a user exit program queue entry. The user exit program is starting with a log file copy to the archive log destination. In our case, the archive log destination is on the TSM server. Upon finishing, a database log file is renamed to a new name with higher sequence log number. This explanation is as expected, unless a TSM inventory is made.

$db2adutl query archive log 

API 20.480 B  19.01.2004 20:55:25   
	/ARTIST/NODE0000/S0000018.LOG 
	Log file for DB2 database ARTIST
API 16.384 B  19.01.2004 23:51:34   
	/ARTIST/NODE0000/S0000019.LOG 
	Log file for DB2 database ARTIST
API 12.288 B  20.01.2004 01:06:14   
	/ARTIST/NODE0000/S0000020.LOG 
	Log file for DB2 database ARTIST
API 53.248 B  23.01.2004 18:51:56   
	/ARTIST/NODE0000/S0000021.LOG 
	Log file for DB2 database ARTIST
API 53.248 B  23.01.2004 18:56:31   
	/ARTIST/NODE0000/S0000021.LOG 
	Log file for DB2 database ARTIST
API 12.288 B  23.01.2004 18:57:31   
	/ARTIST/NODE0000/S0000022.LOG 
	Log file for DB2 database ARTIST
API 12.288 B  23.01.2004 18:57:31   
	/ARTIST/NODE0000/S0000023.LOG 
	Log file for DB2 database ARTIST
API 36.864 B  23.01.2004 18:51:51   
	/ARTIST/NODE0000/S0000024.LOG 
	Log file for DB2 database ARTIST
API 12.288 B  24.01.2004 01:05:07   
	/ARTIST/NODE0000/S0000025.LOG 
	Log file for DB2 database ARTIST
API 237.568 B  24.01.2004 02:43:33  
	/ARTIST/NODE0000/S0000026.LOG 
	Log file for DB2 database ARTIST
Listing 1: Listing archived log files directly from the TSM server

The database log file S0000021.LOG had occurred two times on the TSM server. Both archived log files have the same size; however, there is a difference in the timestamp.

Knowing that the main principle of database integrity is the existence of a unique archive log file, there may be uncertainty about which one is the right one. The secret lies in the imperfection of the user exit program and not in the DB2 database itself.

To demonstrate my claim, I have created a scenario for generating duplicate archive log files on the TSM server:

  $ db2 commit
->	database log file closed and ready for archiving 
->	user exit program is taking copy of the file and starting with file
	sending to the TSM server
  $ db2 deactivate db artist
  DB20000I  The DEACTIVATE DATABASE command completed successfully.  
->	request for database closing executed
->	database is closing, user exit did not return confirmation to DBM, that
	that the archiving was finished successfully, database is closed 
  $ db2 activate db artist
  DB20000I  The ACTIVATE DATABASE command completed successfully.
->	database is actived, DBM is sending new archive request f	or the same 
	database log file, and the user exit program is closing and sending log 
	file one again to the TSM server
Listing 2: Generation of duplicated archive log files on the TSM server

Following this explanation, the user exit program, initiated from DBM is creating two copies on the TSM server. It has been my experience that we can even find two identical files with different sizes, where the first file is smaller than the second one. Generally speaking, this is not a problem, because the file with the later timestamp is only considered during recovery.

Tip 2. Backup restored from the TSM server having wired timestamp

During a DB2 database redirected, restore operation there was an occasion to retrieve a database backup image from the TSM on the local file system. After restoring the backup image, the restored file has "strange" timestamp information associated.

$ dsmc res  "/ARTIST/NODE0000/ARTIST.0.artist.NODE0000.CATN0000.20040116010133.000

Tivoli Storage Manager
Command Line Backup Client Interface - Version 4, Release 2, Level 2.1 
(C) Copyright IBM Corporation, 1990, 2001, All Rights Reserved.
 
Restore function invoked.
 
Node Name: ARTIST
Session established with server TESTTSM001: AIX-RS/6000
  Server Version 5, Release 1, Level 6.4
  Server date/time: 17.01.2004 01:03:33  Last access: 10.01.2004 10:50:37
 
 ** Interrupted **
ANS1114I Waiting for mount of offline media.
Restoring                 /ARTIST/NODE0000/ARTIST.0.artist.NODE0000.CATN0000.20040116010133.000

---> /ARTIST/NODE0000/restore/ARTIST.0.artist.NODE0000.CATN0000.20040116010133.000 [Done]      
 
Restore processing finished.
                                  
Total number of objects restored:         1
Total number of objects failed:           0
Total number of bytes transferred:   324,49 MB
Data transfer time:                   26,10 sec
Network data transfer rate:        13.120,92 KB/sec
Aggregate data transfer rate:      1.638,27 KB/sec
Elapsed processing time:           00:03:29

->-> ls -lrt
----rw-r--   1 artist  db2   1119911936 Aug 15  1995 
ARTIST.0.artist.NODE0000.CATN0000.20040116010133.000

Listing 3: Restoring database backup image from the TSM server to the local filesystem

The restored backup image file has an incorrect date-stamp, showing a file date-stamp of 15.08.1995, instead the regular one of 01.16.2004. Searching for an explanation of these phenomena, all I found was a "small incompatibility" between the Sun Solaris operating system and the TSM Server. Ignoring that, the restored database backup was fully usable.

Tip 3. Discovering the user exit error codes

The IBM DB2 documentation is missing a link to the user exit program error code list. Suppose that there is a problem during the archiving of the database log files, heaping the errors and warning messages in the database error log db2dump.log as displayed below:

$tail -f db2diag.log

2004-01-11-14.17.07.452242   Instance:artist   
	Node:000
PID:13064(db2med.22804.0 3)   Appid:none
database_utilities  sqluMCWriteToDevice   Probe:50
Media controller -- problem writing to vendor device. 
	rc = -2025
 
2004-01-11-14.17.07.502244   Instance:db2udb1   
	Node:000
PID:22804(db2agent (ARTIST) 0)   
	Appid:*N0.artist.030623121652
database_utilities  sqlubMWResponse   Probe:5   
	Database:ARTIST
 
DiagData
ffff f817 3431 ff54 534d 00                   
	....41.TSM.
Listing 4: Errors related to the saving log files on the TSM server

A DBA is required to interpret the User Exit program return codes, in a manner that will obtain detailed information about the error condition. The return codes from the Tivoli Storage Manager APIs are located in the TSM API header file (dsmrc.h). A file is located in the TSM API installation directory.

>> find / -name dsmrc.h -print

/opt/tivoli/tsm/client/api/bin/sample/dsmrc.h
Listing 5: User exit program error code list location

The error message -2025 indicates the non-existence of the backup management class.

>> cat /opt/tivoli/tsm/client/api/bin/sample/dsmrc.h | grep 2025
#define DSM_RC_INVALID_MCNAME  -2025 /* Mgmt class name not found        */
Listing 6: User exit error code 2025

Tip 4. Tackling with DB parameter TSM password

There are four database parameters related to the TSM server configuration. After changing the database parameter, TSM_PASSWORD, you might be surprised with the result.

$db2 get db cfg for artist | grep TSM
TSM management class                 (TSM_MGMTCLASS) =
TSM node name                        (TSM_NODENAME) =
TSM owner                            (TSM_OWNER) =
TSM password                         (TSM_PASSWORD) =*********
Listing 7: TSM related database parameters, with enabled TSM_PASSWORD

As described in the IBM document "The Backing up DB2 using Tivoli Storage Manager," setting the TSM password parameter to some value would direct the DB2 system command db2adutl to read the newly defined password before accessing the TSM server. Unexpectedly, after changing the default null password to some value, a new database backup could not be made.

db2 =-> backup database artist online use TSM
Backup successful. The timestamp for this backup image is : 20040110150134

db2 =-> update db cfg for ARTIST using TSM_PASSWORD 'Artist'
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.
DB21026I  For most configuration parameters, all applications must disconnect from this 
database before the changes become effective.

db2 =-> terminate
DB20000I  The TERMINATE command completed successfully.

$ db2 connect to ARTIST

   Database Connection Information
 Database server        = DB2/Sun 7.2.0
 SQL authorization ID   = artist...
 Local database alias   = ARTIST

$ db2 get db cfg for ARTIST | grep TSM 
TSM management class                    (TSM_MGMTCLASS) =
 TSM node name                            (TSM_NODENAME) =
 TSM owner                                   (TSM_OWNER) =
 TSM password                             (TSM_PASSWORD) = *****

db2 =->backup database ARTIST online use TSM
SQL1015N  The database must be restarted because the previous session did not conclude 
normally.  SQLSTATE=55025
....

db2 =->  backup database ARTIST online use TSM
SQL2062N An error occurred while accessing media.
Listing 8: Changing TSM_PASSWORD parameter

After some additional testing, the TSM password database parameter was reset to null, after which the database backup started working again.

db2 => update db cfg for ARTIST using TSM_PASSWORD ''
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.
DB21026I  For most configuration parameters, all applications must disconnect from this 
database before the changes become effective.
Listing 9: Resetting TSM_PASSWORD to default value

In the previously mentioned document, under Appendix A, "Quick start/checklist for configuration" there exists additional information specific for Sun Solaris. It states, for proper DB2 backup functionality, the DBA needs to update the database configuration parameter TSM_PASSWORD using a NULL value. My conclusion is that there are some incompatibility problems with an existing TSM API client and the DB2 database version. Following this experience, use the TSM_PASSWORD only for a redirected restore, on your Sun Solaris installation.


Tip 5. TSM Communication problems


A period of successful TSM usage followed by a of series unsuccessfully backup operations, raises the standard question: "What went wrong if nothing has changed?" Rechecking the environment and database configuration parameters leads to the conclusion that only the network configuration between database server and the TSM server has been changed. The following error message has occurred during a regular online database backup on the TSM:


db2 => backup db artist online use tsm
SQL2025N  An I/O error "-50" occurred on media "TSM".
Listing 10: Backup error condition

Return codes from Tivoli Storage Manager APIs describe this problem as a TCP/IP communications failure:


cat /opt/tivoli/tsm/client/api/bin/sample/dsmrc.h | 
  grep -50   

#define DSM_RC_TCPIP_FAILURE       
  -50 /* TCP/IP communications failure      */
Listing 11: TSM error code explanation

In order to find more detailed information about the error requires the TSM API tracing files. TSM API tracing is enabled using traceflags and tracefile configuration
entries in the dsm.opt configuration file.


# cat dsm.opt
SERVERNAME              TESTTSM001
traceflags             service api
tracefile              /tmp/artist_tracing.log
Listing 12: Enabling TSM API tracing

Possible sources of the problem might be:

  • a problem with some database configuration parameters
  • a password problem between the TSM server, TSM API and DB2
  • a problem with the TSM server configuration
  • a problem in the network infrastructure connecting the TSM server and the database server

After enabling communication tracing and a series of connectivity tests, this problem has shown up. There were infrastructure changes on the network and the DB2 database server had been disconnected from a fast 100MB and re-connected to a slower 10MB network segment. This resulted in communication between the DB2 database and the TSM server having a longer delay than before and backup failed to finish. Luckily, we have some parameters at our disposal for fine communication tuning:

adsm> q opt
Server Option         Option Setting           
-----------------     --------------------     
CommTimeOut           900 (-> to 1800 ) 
Listing 13: Changing the TSM server CommTimeOut parameter

A change has been made to the CommTimeOut parameter. The value has been extended from 900 seconds to 1800 seconds. From the TSM Administrators Reference Guide, the CommTimeOut parameter has the following description:

"CommTimeOut
- Specifies how long the server waits (in seconds) for an expected client
message during an operation that causes a database update. If the length of
time exceeds this time-out, the server ends the session with the client. You
may want to increase the time-out value to prevent clients from timing out if
|there is a heavy network load in your environment or client will be backing up
large files. "

Tip 6. Checking Backup on the TSM server

In the DB2 version 7.1, IBM offered a new utility, db2ckbkp. This utility is used to:

  • test the integrity of a backup image and search for possible corruptions
  • display information that is stored in the backup header
  • display information about the objects and the log file header in the backup image

Detecting an unusable backup directly on the TSM could save precious DBA time. However, system utility db2ckbkp has one small feature, it cannot be used to check a backup on the TSM server. The DBA has to restore the whole backup file from the TSM server on the local filesystem, and than check it with db2ckbkp utility. Checking a TSM backup file for possible corruptions, using the db2ckbkp utility:

$ db2ckbkp ARTIST.0.artist.NODE0000.CATN0000.20040125010545.001
[1] Buffers processed:  ###############################################################################################
Image Verification Complete - successful.
Listing 14: db2ckbkp system utility

IBM has acknowledged that the db2adutil system command should be used for checking a database backup on the TSM server. An example of the TSM backup check:

$ db2adutl VERIFY FULL TAKEN AT 20040125010545.000

Query for database ARTIST

Retrieving FULL DATABASE BACKUP information.  Please wait.

   FULL DATABASE BACKUP image:

     ./ARTIST.0.artist.NODE0000.CATN0000. 20040125010545.000, Node: 0

   Do you wish to verify this image (Y/N)?

Read 4194304 bytes, assuming we are at the end of the image
 
Image Verification Complete - successful.
Listing 15: TSM related database parameters, with enabled TSM_PASSWORD

From the IBM documentation:

Verify option performs consistency checking on the backup copy that is on the server. This parameter causes the entire backup image to be transferred over the network.

The whole image will be read from the TSM server into a local memory buffer. (Not the

whole image at once, but piece by piece). Only a temporary file was written to the local disk. I have been testing this option, which is extremely useful, and did not find any problems even with large backup files. Taking measurements, some large backup files (100 GB), required only 10 minutes for checking. After testing with several files different sizes, I doubt that a backup file is entirely transferred to the local filesystem, as IBM documentation states. Nevertheless, this method is fully functional.

Conclusion

The explained situations reflect things that might shorten the TSM learning path. The TSM backup system is very powerful and very well suited. A DBA needs to test the TSM recovery process so that in the event a recovery is necessary, it will not be the first time. Half of the battle is knowing what features are available, and the other half is testing.

Related Articles:

» See All Articles by Columnist Marin Komadina

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved