Monitoring SQL Server with Performance Monitor Charting and Alerts
Monitoring production systems can alert you to problems before they get out of hand. Performance Monitor (perfMon) is a good tool for this. You can find perfMon under Programs\Administrative Tools (Common) in the Windows NT Start Menu. The exe is in %SystemRoot%\system32\perfmon.exe.
When you start perfMon, you will start with what might be a perplexing blank screen:
The performance monitor does charting, alerts, log files, and reports. This article will detail charts and alerts. These options are selectable from the buttons on the top menu (chart is selected above) and from the view menu.
In each of the sections the use is similar in the way you add items you want to see or track. The plus button brings up a list of items to track. Edit\Add To Chart gives the same function. The following screen will appear:
Computer: The target will be the local machine until you select the target machine. Pressing the "..." button will scan the network for available computers. You can also enter the computer name into the textbox. You can scan multiple machines in the same chart if you like.
Object\Counter\Instance: These three items taken together define a chart line you can monitor. Objects can contain many counters which can contain many instances. Instances could be processes (exe's) or processors (0 - n), etc. For an explanation of a counter, just use the Explain button on the right.
The charting section shows current activity for selected counters. To add counters to your chart, click "Add" on the "Add to Chart" screen. After you have added several counters, your chart may look something like this:
It is highly recommended that you save your chart settings (File\Save) at this point and start the program next time by opening the settings file.
Since there are dozens or hundreds of counters, one of the most helpful things I can do is point out a few that are most useful in making sure a production SQL Server system is running well.
Useful Chart Counters:
These are in the form Object--Counter--Instance.
System--Total Processor Time: This is the percentage of time all processors spend doing work. If this is 100% your system is being overtaxed. Admin preferences vary, but this should only rarely top 80%. This is usually the single most important determinant of perceived performance.
SQLServer: SQL Statistics--Batch Requests/sec: This tells how many requests users are making against SQL Server. This usually moves in step with processor time. It shows people are actually using your server.
To highlight a chart line (makes the line bold white) highlight the description in the bottom grid and press the backspace. It is also convenient to move up and down through the list and the highlighted chart line will be changed as the highlighted item is changed. You can also double click an item and see its description.
The other important setting for charts is how often it updates. Too often, and the polling will tax the server. If not often enough, you will not get a realistic picture of current activity. Somewhere between one and five seconds is usually adequate.
Alerts are accessed through the second button (from left) or the View\Alerts menu. Alerts will let you monitor for more rare or critical items. You may not be around when the computer starts using 100% of CPU, so you will miss it as the chart overwrites itself. But an alert can tell you when it happened and even notify operators.
You add counters to alerts in the same way as charts, with the "+" button or by Edit\Add to Chart.
When setting up alerts there are two extra things to do. You must specify the alert threshold, the number the alert must pass to trigger a message. Optionally, you can also run a program when the alert fires. As with charting, you should save your alerts to file.
Useful Alert Counters:
I did not include some alerts that normally are 0 or 100 all the time on the chart settings above, you may wish to chart these as well. These are in the form Object--Counter--Instance.
System--Total Processor Time: Alert on Over 80(%)
This is the percentage of time all processors spend doing work. If this is 100% your system is being overtaxed. Admin preferences vary, but this should only rarely top 80%. This is usually the single most important determinant of perceived performance.
SQLServerLocks - Number of Deadlocks/sec - Database: Alert on Over 0
Deadlocks can compromise data integrity for applications that don't handle every operation with correct transactions (any application). They can also frustrate users depending on how the front ends handle them. Any deadlocks are a cause for concern.
SQLServer: Buffer Manager - Buffer Cache Hit Ratio: Alert on
Under 97 (%)
Most systems will see their cache hit ratio forever pegged at 100%, this yields optimal performance, because SQL Server never has to go to disk to retrieve data. Minimal cache missing will not likely be a problem and is to be expected.
Memory - Pages\sec: Alert on Over 2
This tells how often the system uses the page file. When more memory is requested by applications than physical memory available, the disk will be used. This should always be 0.
Here is an alert screen with two alerts triggered:
Send network Message--Net Send
This will send a popup message to the specified network user. Setting this is highly recommended.
As with charting, you set the update interval. Too often, and the polling will tax the server. If not often enough, you may not catch a triggering event. Somewhere between one and five seconds is usually adequate. Also, if a counter is exceeded for a long time, and you have Net Send or Run Program enabled, you may receive a lot of messages.
You can save triggered alerts to a comma or tab separated file from File\Export Alerts. This will help you spot longer term patterns that will be lost if you have to stop the performance monitor or reboot the machine.
Here's how the exported alerts above look when opened in Excel:
|Reported on \\WCTR04W119|
|Time: 3:47:16 PM|
|Data: Current Activity|
|Interval: 3.000 seconds|
|6/25/01||3:42:23 PM||85.247||> 80.000||% Total Processor Time||System||\\hgn_sql1_nt|
|6/25/01||3:43:07 PM||80.491||> 80.000||% Total Processor Time||System||\\hgn_sql1_nt|
Unfortunately, perfMon will not alert you when a system goes down. It will probably hang. So when you see perfMon hanging, your server is either thrashing or down, and users are probably similarly suffering. If you are lucky, one of your alerts will be tripped before a machine goes down. Otherwise, I would suggest another monitoring program. I have seen a small SQL application that will usually be adequate. It just queries the Server and sends an alert if it encounters a timeout.
More detailed information about the performance of individual stored procedures and queries is obtainable through Performance Monitor, the subject of my next article.