Microsoft SQL Server performance monitoring with Microsoft Windows NT Server Performance Monitor

Saturday Dec 19th 1998 by Alexzander Nepomnjashiy

Included in the wide range of responsibilities for database server administrators, (besides installations, set-up and configuration, troubleshooting and user support), are tasks such as support and control of the database server performance. Correctly planned controls reveal problems at the stage of their origin and allow the administrator to take measures toward their resolution.

Included in the wide range of responsibilities for database server administrators, (besides installations, set-up and configuration, troubleshooting and user support), are tasks such as support and control of the database server performance. Correctly planned controls reveal problems at the stage of their origin and allow the administrator to take measures toward their resolution. It's common knowledge that it's much easier to prevent problems, than to fight their consequences.

Performance monitoring evaluates the efficiency of the organization of your server as well as the construction of database applications. Based on monitoring results it's possible to rebuild applications and system options to achieve peak efficiency and to avoid bottlenecks. A bottleneck is a situation in which the general performance of the system is restrained by one of the hardware components of the server, which works on limited possibilities.

Let us suppose that your database server has a SCSI disk subsystem, a powerful Pentium3 processor, but only 64 megabytes RAM. In this case, the RAM is the bottleneck of the system; the performance in this case is reduced because the system has to work with a hard drive, exchanging data between the RAM and a swap file. Even with the replacement of components, (processor, hard drives), for a more powerful and productive environment, the general increase of performance will not be noticeable until you increase the RAM volume. On the other hand, even with sequentially increased accessible RAM volume, improvement of productivity will eventually be slowed down by other bottlenecks. It is quite possible that in this case the processor will have to be replaced or a faster hard drive (with a greater spindle frequency of rotation), will need to be installed in the system. Replacement of the components ends with the appearance of the next bottleneck in the system and so ad infinitum.

There is a consensus of opinion that it is impossible to achieve ideal performance, however it is quite possible to achieve a system with a performance that satisfies both you and your users. To achieve optimal performance you'll need to satisfy two conditions: 1) to reduce the response time of the system and 2) to increase capacity. The response time is the time slice between the system obtaining the user's query and returning to the user the first line of a resulting set. Capacity is the number of queries which the server is able to process during the defined period.

To analyze system performance, in order to reveal bottlenecks, it is necessary to possess certain statistical material. For this purpose, it is necessary to create a performance template (performance baseline). You then observe the system during periods of typical activity, taking readings of meters in defined time intervals.

It is desirable to create a separate template for each object of the system--a processor subsystem, a RAM subsystem, hard drives and network interfaces. The material obtained can be used both for the operational analysis of the concrete server, and for a comparison of congestion across all of the servers in the company. Based on given templates it's possible to make a decision to upgrade the server, or reallocate its load between other servers.

For example, databases that are often used may be on one server, while an identical server, (identical as far as hardware performance), sits idle since the databases placed on it are very seldomly used. In this case it is necessary to transfer part of the databases, and to switch part of users' calls to the second server to balance the load of both servers.

Before you begin to monitor your database server, it is necessary to define which tasks will be required to fulfill your needs. Besides performance control, there are other methods of monitoring with their own specifics and methods of realization. For Example:

  1. information accumulation about users activity. The statistical information obtained this way may be used both for the analysis of the safety system, and for obtaining information about transactions fulfilled by users.
  2. debugging of applications and stored procedures code. After the code of the application or the stored procedure is written it is necessary to trace what it will allow, not only to make sure of its serviceability but also to observe all processes of its execution, while simultaneously optimizing the code.
  3. preventive monitoring in order to reveal problems in database server operation. Search for errors in Transact-SQL queries (and stored procedures) code. You may reveal certain problems connected with the operation of computer components or obtain data necessary for the diagnosis of such problems.

During the planning stage it is necessary to define the appearance of the events you will watch. You need to select not only the general direction of monitoring, but also concrete objects, observation of which will allow for the completion of set tasks. It is also necessary to decide what information, related to selected events, needs to be collected. So, if the activity of a disk subsystem is to be monitored--it is necessary to collect information on the number of "read/write" operations, the number of these operations per second, and the queue lengths to disk to be serviced. Having placed appropriate filters, it is then possible to begin collecting the performance information.

It is not necessary to analyze the arriving information immediately. Meter results can be saved in special files for subsequent analysis. Some monitoring devices, (for example - SQL Server Profiler), allow you to play back a sequence of events, using the information contained in such a file.

Choosing Performance Monitoring tool

Microsoft SQL Server, along with Microsoft Windows NT, represents an extensive toolkit for monitoring. The choice of which concrete tools/utilities to use is determined first of all by which tasks the administrator faces. Microsoft SQL Server 7.0 offers the following monitoring tools:

  • SQL Server Enterprise Manager
  • SQL Server Profiler
  • Stored procedures
  • Transact-SQL commands

In turn, the operating system Microsoft Windows NT features an additional set of monitoring tools:

  • Windows NT Performance Monitor
  • Windows NT Task Manager
  • Windows NT Event Viewer

As an alternative to the firmware monitoring utilities, you may use applications supplied by 'third-party' companies that support the SNMP protocol as well as SQL Server 7.0.

Server monitoring with Microsoft Windows NT Performance Monitor

Performance Monitor is used to observe the operation of the server system, as well as all events happening on that system. Both the computer on which Performance Monitor is run locally, and a remote computer (running Microsoft Windows NT) may be included in the role of the system being monitored. It is possible for a centralized administrator to monitor operation productivity of both the remote server and any server in the local network without leaving his workstation. Remember, at the execution of the monitoring, there will be an increased load on the network that may affect its capacity.

Performance Monitor is a Windows NT 4.0 Administrative Tool for monitoring the performance of Windows NT workstations and servers. It uses a series of counters to track data, such as the number of processes waiting for disk time, the number of network packets transmitted per second, and the percentage of processor utilization. You can watch this data in real time, log it for future study, use it in charts and reports, and set alerts to warn you when a threshold value is exceeded.

Processor:% Processor Time shows CPU loading. In an SMP environment, (having more then 1 system processor), the administrator can observe how busy every processor is. In addition, the same parameter can be used for the definition of processor utilization by each Windows NT thread. If Processor:% Processor Time counter shows an average value in a range from 80 up to 100% (and hard drive and network related counter values are low), consider either a faster processor or an additional processor.

Note that short-term levels of the counter Processor:% Processor Time of 80%, or even peaks (up to 100%) are not necessarily a sign of productivity problems with a processor subsystem. Since operational resourses of the processor are used not only by the database server, you should define whether SQL Server is the cause of high utilization of CPU time. Use SQLServer:CPUtime to define share of SQL server in Processor:% Processor Time. Having determined that the cause of increased load is SQL server, and also which process provokes it, you should analyze the project of query executed at this time. Make sure that indexes are used optimally in queries.

The query may be cached well, yet at the same time overload the I/O subsystem, causing exessive CPU cycles. It may mean that when designing the table the index was not optimally planned. If the query plan is optimal, it is possible to troubleshoot the problem by scaling; for example, add processors or install a more productive processor. On the other hand, if a value of Processor:% Processor Time counter is constantly low--it is not nesessarily good. For example, if the CPU graph is consistently low (less than 50 percent), and the %Disk Time is consistently high, this can indicate an I/O bound state.

Processor:% Privileged Time is convenient for determining excessive I/O loading. If the average value exceeds 20%, and Processor:% Processor Time is much lower than 80%, it means that SQL Server is putting a strain on a subsystem of input-output. You need to analyze the database design, workload on RAID--controller and/or the network card.

System:Processor Queue - The number of tasks waiting for processor time. If this value is consistently two or more, it means that the central processing unit is overloaded. Additional processor power is necessary to solve this problem.

System:Context Switches/sec - is the rate at which the processor is switched from one thread to another, and is an indication of which threads are getting processor time.

Process:Thread Count - number of active threads. The value for this counter along with System:Context Switches/sec can be used for optimal configuration of SQL server to reduce excessive CPU utilization.

Process:Virtual Bytes - determines what memory volume uses SQL server and which applications do not use it effectively.

Process:Working Set - memory size used by the process. The modification of SQL server configuration after the analysis of these counters will allow optimizing a memory allocation between a database server, an operating system and other applications of a database server.

SQLServer:Cache Hit Ratio - for well balanced applications, the number of hits in the cache should be "near" 100%. Often, reaching a high level of hits in the cache is achieved by a simple extension of the RAM,(adding additional memory modules). Also, you can use a technique, such as SQL Server trace flags, for example trace flag 1081. Trace flag 1081 allows the index pages to make a "second trip" through the data cache. When SQL Server needs to flush an index page out of cache to bring a new page in, it chooses a different page unless this particular index page has already been bypassed once. Therefore, index pages are allowed to stay in the data cache longer.

And in conclusion some tips to memorize:

To optimize the allocation of SQL files of the server try to follow the following rules:
1) Always use disk arrays RAID. Their usage ensures reliability and productivity of a disk subsystem.

2) The more disks that are in the array, the higher the general productivity of the disk array. The more channels on the RAID-controller, the better the bus is loaded. The more RAID-controllers there are, the more effective the usage of the disk array. The more buses on the motherboard, the more of the above-mentioned components may be installed on the server thereby increasing productivity as a whole (if processors "cope"). Further, there will already be clusters. By the way, even in the cluster-less system try to use RAID-controllers, as they can work in a cluster (For example, do not initialize SCSI bus when power is turned on).

3) Locate files of sequential access and files of a failed access on different disk arrays (not logical disks created on one RAID the array). As a rule, logs of transactions represent files of sequential access, and files of databases - a failed access. For example, you can place LOG-files on RAID 1, and the database on RAID 5 or 10 (= RAID 1 + 0). Further, if you do not want to mix your data with system data, you may create two data files in two different file groups. The first file group will contain system objects, and the other will contain your objects. It will not cause any noticeable scoring in productivity, but will allow dividing essentially different data types. The principle of diversity of the data in file groups, according to access type to the information, can be also applied to tables. Significant management complications will be paid back with interest by the growth of productivity of the application.

4) Place files of sequential access on different disk arrays. Often the server should access several tables and/or indexes simultaneously. Optimal allocation of such files with the help of their inclusion in different file groups, (located on different disk arrays), may also positively affect the general productivity of the application.

Mobile Site | Full Site