DBA Call to Action: Zeroing in on Performance Problems

Friday Feb 13th 2004 by James Koopmann
Share:

Have you ever been asked to help with the evaluation of a database that is not performing optimally? Here is a quick guide to help you if you just do not know where to start.

Have you ever been asked to help with the evaluation of a database that is not performing optimally? Here is a quick guide to help you if you just do not know where to start.

Thrown Into the Fire

In my early days of being a database administrator, I was thrown straight into the fire of database performance issues. Now, I did not have any experience, except for the education of database systems from my local college, and was at the mercies of other professionals around me. The problem was that many of them did not have much experience either. I would often asked myself why in the world someone would ask me to work on database systems that were the lifeline of their company and would trust someone with such little experience. The reason they let me 'play' with their systems was that they were at my mercy combined with a few things happening with their database systems in general. Users were beginning to ask more and more of their databases, they were storing more and more data, the amount of people with real credentials to tune a database was low and computer hardware was still not quite robust enough for the solutions in which they were asked to take part. Today, all of these issues are still true for the most part except for computer hardware. The hardware of today can mask the performance problems of the typical database environment for quite some time. The simplistic nature of buying hardware, throwing a database on it, creating objects to store the information, and users creating applications against the database does create a false sense of stability that typically will only break down over time when true scalability of the system comes into question.

What does this all mean? Well, if you stick in this field of database administration long enough, you will in due time be faced with the issue of having to investigate a performance problem. What makes matters worse is that no one will typically know where the problem arose from or where to start. It is your duty as a database administrator of the system to track down the problem so that you can effectively come up with a solution. So where do you start? This article will give you an approach to use where you can begin to investigate the problem and hopefully come up with your own feelings and convictions about what must be done with this highly stressful and possibly confrontational issue.

User Input

Do not underestimate the information the users of database systems have when first determining the problem with a system that is not performing well. It is true they cannot tell you anything about memory consumption, how the data is laid out across your storage subsystem or how an application is written, however, what they can do is verbally explain to you the pain they are experiencing when they use the system. They have a unique and distinct experience that can help you zero in on the use cases that are experiencing problems. Zeroing in on the events or tasks that cause the problem will help you later down the path when you want to re-create the problem. These problems are what you should concentrate on, and since they are the pain points experienced by users, should be your first line of attack to begin easing the perceived performance problems. You may still have many issues that are occurring that you will need to fix but if some of your users are happy you will have a better, and less stressful, chance of sleep at night.

When talking to users, it is very important for you to have a set of questions that you can ask to extract all possible information on the system from these users. In Listing 1, I have given you a few questions that you can incorporate when you talk to the users of the system. The idea you should come away with is that you need to get as much information as possible about how users are feeling about the system while instilling in them the importance of the problem to you and your commitment to getting it resolved. In addition, you will notice that these questions are geared to extract information about how the system is behaving now and relating it to how the system has behaved in the past. By doing this you will begin to get an idea for the change of events that might have happened between a system performing properly and poorly from the users' perspective.

Listing 1
Questions you can ask your users

1.  In you own words, explain the problem.
2.  When did you first experience the problem?
3.  Does this problem always occur?
4.  Is anyone else experiencing the problem?
5.  Are you still able to do your work?
6.  Compare and contrast the time it takes for you to complete the task when you experience a problem with when you would not experience the problem.
7.  Did the problem immediately manifest itself or was it gradual?

System Input

When being thrown into a performance problem, you should initially and quickly determine if there are any obvious high-level performance issues. You should look at CPU usage, memory usage, network usage, and the internal Oracle high-level statistics through the V$SYSTEM_EVENT view. Listing 2 gives you a quick list of tools to use that can give you a quick glimpse into your database system.

Listing 2  Where to look for statistics

1.  top
2.  vmstat
3.  sar
4.  iostat
5.  netstat
6.  V$SYSTEM_EVENT

While top, sar, iostat, and netstat will give you real-time statistics, V$SYSTEM_EVENT will give you an accumulated representation of database statistics from when the database was started. Listing 3 will give you a quick and easy method to determine what the events are that are having trouble within your database. This is where your interview with the users comes into play. Make sure you take your first snapshot of the statistics before the users execute the task that they are experiencing as a problem and then after the task completes you need to take an ending snapshot of statistics. After you have both beginning and ending statistics, issue the SQL provided in Listing 3 to determine the true events and wait times for them.

Listing 3
Method to get true statistics for Oracle wait events around a user task/workload

1.Create table beg_system_event as select * from v$system_event
2.Run workload through system or user task
3.Create table end_system_event as select * from v$system_event
4.Issue SQL to determine true wait events
SQL> SELECT b.event, 
(e.total_waits    - b.total_waits)    total_waits,
(e.total_timeouts - b.total_timeouts) total_timeouts,
(e.time_waited    - b.time_waited)    time_waited
FROM beg_system_event b, 
          end_system_event e
 	WHERE b.event = e.event; 

Make it Simple

At first glance, you may think the above approach to be too simplistic. While you may need to go deeper into a problem area, it would be pre-mature to begin tracing all the applications and users within your system before you get a good handle on the high-level health of your database. I have seen too many DBAs begin to turn on tracing, shut down applications, and literally take over the database with high performance sucking DBA tasks that mask, hide, and provide too much detail before they know where the problem resides. I cannot count the number of database systems I have been asked to evaluate and the issue will reside somewhere in the network or on a client machine that was not performing properly. Performance problems outside of the database were all seen by looking at the high-level statistics and noticing the database was not performing any work what so ever and thus could not be involved in the performance problem. Just imagine the time and effort wasted if you went in and started tracing user sessions and going through trace output to only notice, the sessions were doing nothing.

Going a bit deeper

When you have zeroed in on what the high-level performance issues are, through the high-level statistics, you can now begin to delve deeper into other Oracle related statistics to help you in your diagnosis. Typically the way I do this is through the capture of beginning and ending statistics for other V$ views the same way I did in Listing 3. These also should be wrapped around the user task or workload that is performing poorly. Listing 4 gives you the base tables that I always collect beginning and ending statistics for. After you have collected the statistics, all you need to do is take your favorite DBA script, relate the two tables of statistics by a key and subtract any values with which you are concerned.

Listing 4
Other important V$ views for zeroing in on performance areas

1.  V$SYSSTAT
2.  V$FILESTAT
3.  V$WAITSTAT
4.  V$LATCH
5.  V$LOGFILE
6.  V$ARCHIVED_LOG
7.  V$LOG_HISTORY
8.  V$LIBRARYCACHE
9.  V$ROLLSTAT
10.  V$PARAMETER

Your Input

All I can tell you is to stick to your guns. Identify the issues with relation to the users and the problems they are experiencing. Your success as a DBA is the pain you can alleviate from your users of the database system. Find those events in the database or system level resources that are consuming the most amount of time and try to reduce them by applying what you have learned about database systems. Try to acknowledge when you are going down the wrong path and do not fall into the trap of trying to solve problems that do not exist. Prioritize your efforts to reduce the pain felt by users and you will succeed.

» See All Articles by Columnist James Koopmann

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved