SQL Server Performance Tuning : Pt. 2

Monday Aug 6th 2001 by Aaron Goldman
Share:

Having found the key performance issues of your system, it's now time to start optimizing. Follow along as authort Aaron Goldman outlines the steps he uses to try and correct performance problems.

Optimization Techniques

In Part I of my SQL Server Performance Tuning and Optimization article, I identified ways of reducing the process of finding the location of SQL performance issues to a science. Unfortunately, fixing performance issues, the focus of part II, can be something of a black art. While there are some quick guidelines I have to offer, sometimes the answers and directions consist of trial and error and judgement calls based on experience or understanding of the engine which are beyond the scope of this article.

The whole odyssey reminds me of physics problems where you have to solve several equations to get the information you need in order to answer your original question. Where most people go wrong is forgetting to answer the original question, thinking they have an answer already. Our question was "How can we make this process faster". We have now found where the problem is, but we don't yet have a solution.

The good news is there are often quick ways of improving performance. And there are techniques you can use that will have a high probability of being successful, though they may be more cumbersome.

Though my answers usually involve indexing, I have not been impressed by the Index Tuning Wizard.

Depending on how I see the problem I usually go through the following strategies of things to look for and try. In each case the same basic idea applies. Understand the SQL that is running slowly. Try to think of what is making it slow. Change the SQL or indexes or tables. Then rerun the statement to see if your change(s) worked. Sometimes you can spend hours trying to get from 2 seconds to 500 ms; other times 1 second will be reduced to 100 ms almost effortlessly. Be prepared for more of the former than the latter.

Errors, Omissions, Pleonasms, and Cursors

This is the first thing to look for, not because it is the most common, but because it should become apparent as you examine and come to understand the problem statements.

Oftentimes, a query is just wrong. Even wrong queries can return the right results, or appear to do so. A select without a where clause is a common mistake. For example, setting a variable from a table without a where clause:

SELECT @val = col from aTable

This will select all the rows from the table and the variable will end up with the value in the last row. On a fast machine with a medium dataset size, this could be missed--the statement will be a lot slower than it should be, but may not be slow enough to stop a developer or tester.

Sometimes an unnecessary redundancy (pleonasm) is used (as opposed to those necessary redundancies;):

From

Select aCol 
  from aTable 
  where anID in 
    (select anID from aTable 
       where anotherCol in (someVals))

To

Select aCol 
  from aTable 
  where anotherCol in (somVals)

Cursors are sometimes necessary, but many times they are used when when not really needed. Performing a quick operation several thousand times in a cursor can be orders of magnitude slower than performing the same operation for all the rows once.

Searching for coding errors and irregularities often takes a good knowledge of SQL because they are often caused by inexperience. It's often hardest to see our own mistakes, so this is an area where another team member can and should be asked for help.

SARG's and Indexing

Indexing is really the heart and soul of optimization. Most optimization issues will bring up the need for an index or two, even if some of the other techniques are employed. Indexes are usually the fastest way to access data, and seeing how indexes are used by the Optimizer is vital to fixing performance problems.

SARG stands for search arguments. These are restrictions in where and join clauses that can be applied to indexes to select rows more quickly:

Select au_lname 
  from authors 
  where au_lname like 'w%'

In this statement, the column au_lname in the where clause is the SARG. If there were a million rows in the table, an index on au_lname would be a huge performance improvement. With the index, the above select could jump right to the J's and pull out the W%'s right away. Without the index there would be a table scan. Not only would every row have to be checked, but the amount of data pages checked would be much larger (resulting in more time for searching) because the table has more columns and data than the index. The table could consist of 10,000 pages where the index might be only 1,000 pages and the number of pages accessed to find the data in the index might be only a dozen. So this index would reduce a 10,000 page search to 12, for an immediate performance gain of 100 times.

Remember, SARG's must match the first column in an index. The index (state, au_lname) would not help the above query and would not be used. It would be like looking for first names in the phone book; it's just not ordered for that.

Sometimes there are conditional criteria which are understood, but not used in the query; maybe they are used later in code logic. A common example is when adding a date restriction to a query can improve performance. Oftentimes adding search arguments can speed up a query because it will use different or additional indexes.

Covered Index

Whenever considering indexes, consider a covered index for even better performance when retrieving large rowsets. A covered index is nothing more than an index that includes all columns referenced in a query. So a covered index is only a covered index in relation to particular queries. Repeating the above example:

Create index lname on authors(lname)
Select au_lname 
  from authors 
  where au_lname like 'w%"

The index on au_lname is a covering index for this query. What this means is that all the information the query needs is in the index data pages and the table data pages will not need to be accessed to execute the query. On the other hand if the query were:

Select au_lname, state 
  from authors 
  where au_lname like 'w%'

and the index still only covered au_lname, SQL Server would search through the index, and then for each entry it found it would have to load the data page and pull the single row off the 8 K page to get the state name. If 255 rows in the index qualified (which would all be on one index page), there would be 255 page accesses of the table data.

Depending on what percentage of rows qualified, the Optimizer might decide to do a table scan instead of using the index. The table might only be 100 pages long, but using the index, in the above example, several of the pages would be hit multiple times, so a straight table scan would be better. Alternatively, if the table data were 1,000 pages (8 MB) the Optimizer would probably see that the index use would be faster. And if the table were 10,000 pages, you could rest assured that the index would be chosen.

So even if the index is used, one index page will yield 255 additional page accesses to retrieve the data. In this case, a covering index would reduce the page accesses back down to one or two. If the index was on (au_lname, state) the data pages wouldn't need to be accessed and the simple index lookup would give all results from the query.

If access were not a big consideration and insert/update and storage space issues were not problems, I would add an additional index (state, au_lname) for the following select:

Select au_lname 
  from authors 
  where state = 'NY'

It has the same keys as the above index, but with state at the beginning, the search will be much faster. Alternatively, the previous index (au_lname, state) is still a covering index, and even though it is not in a useful column order, it still provides a performance improvement over a table scan. A full index scan would require fewer pages than a table scan.

Clustered Index

Rarely do I rely on a clustered index for improved performance, since a covered index can easily beat the performance of a clustered index. A clustered index is really just a fatter covered index. The books will tell you there is only one clustered index. That is true. But if you have the disk space and insert/update performance is not a problem, you can repeat all the table columns (up to 16) in as many indexes as you like (up to 249) for the same or better search performance than a clustered index.

Foreign Keys and Indexing

Foreign Keys can be a factor in insert, update and delete statements. Since foreign keys have to be checked, an update or delete in one table can result in a check in another table for the existence of related data.

Create table 
  appointment(apptID int, ...)
Create table 
  appointmentPerson(personID int, apptID int, ...)

When deleting an appointment, a check in the appointmentPerson table will be performed. If there is no index with a first column of apptID, an index or table scan will be performed checking every page in the table. As in previous examples, a useful index can make a huge performance gain in this operation.

Foreign key fields like this are also good candidates for indexing because of their likelihood of being involved in joins and used as search criteria. SQL Dev Pro, my freeware database utility, has a function to search your database for Foreign Key fields and suggest indexes based on FK's and the rowcounts involved. You can download the tool from www.itdevpros.com.

Move Functions Out of Search Criteria

Oftentimes, SQL functions will stop an index from being used. The getdate() function is a prime example:

From

Select * 
  from aTable 
  where datediff(day, theDate, getdate()) < 3

To

Declare @afterDate datetime
Select @afterDate = dateadd(day, -3, getdate())
Select * 
  from aTable 
  where theDate < @afterDate

Don't be afraid of Over Indexing

With storage becoming cheaper every day and most of the systems I'm working on being more search intensive than update intensive, I have not encountered problems adding extra indexes. While it is true they impede insert, delete, and update performance, usually these are single row operations which complete very quickly, while select operations more commonly require multiple tables and multiple rows. In most situations, the benefits of additional indexes will far outweigh their cost. YMMV.

From BOL:

Indexes not only speed up the retrieval of rows for selects, they also usually increase the speed of updates and deletes. This is because SQL Server must first find a row before it can update or delete the row. The increased efficiency of using the index to locate the row usually offsets the extra overhead needed to update the indexes, unless the table has a lot of indexes.

Then again, the indexes we have been discussing, are probably not for row retrieval in updates.

Reformat statements. Apply different strategy

I have noticed that some IN clauses can be quite intensive for no good reason. It seems the optimizer sometimes generates the entire recordset of the clause for each row of the query. I often rephrase an in clause to a join:

From

Select * 
  from aTable 
  where id in 
    (select id 
       from anotherTable)

To

Select * from
    ATable a join anotherTable t
                  On a.id = t.id

Not Ins can be even worse:

From

Select * 
  from aTable 
  where id not in 
    (select id from anotherTable)

To

Select * from
  ATable a left join anotherTable t
                     On a.id = t.id
                     Where t.id is null

Where possible use 'exists' instead of 'count' because exists stops on the first row where count must get an exact answer from any qualifying rows.

From

IF (select count(*) 
       from aTable 
       where criteriaCol) > 0

To

IF exists(select criteriaCol 
       from aTable 
       where criteriaCol)

Count(*) can be more efficient than count(colName) for that reason. While counting(colName) will eliminate nulls from the count, this is not always desired. Count(*) is optimized to count the rows in the table by the fastest means available.

Statistics - Update Statistics

This usually doesn't play a part in my optimizations because I like to have statistics updated regularly, daily if possible, on all tables with sp_updatestats run from a job. After that, sp_recompile all stored procedures and triggers. This is the code using my sp__execAllInTable procedure:

exec sp_updatestats
select 'SP_RECOMPILE ' + quotename(name ) stmt
  into #ExecMe
  from sysobjects where type in ('P', 'TR')
  exec sp__execAllInTable

SQL 2000 automatically updates statistics, but doesn't automatically recompile code. Stored procedures and triggers will be recompiled after the SQL Server is restarted in either version.

The subject of statistics leads to data distribution issues which I'll save for another article and percentages of qualifying rows which I mentioned.

Temp to Perm Table

I have often found that indexes on temporary tables (#tables) are not well used. Sometimes making a permanent user table in the database works better. For single user (likely batch) processes, you can truncate the table before each use. For multi-user processes, one way to do it is by adding the spid to the table and delete by spid before each use:

Delete permTempTable 
  where spid = @@spid

Denormalize

This is usually indicated by joining too many tables (at least 5) or often having functions like min and max. A multiple query strategy occasionally helps here where you break the statement up into two or three statements that run quickly, possibly joined by temp tables.

Denormalization is one of the least desirable courses of action. Denormalizations have to be maintained. This is a judgement call--a small amount of denormalization can tremendously help a difficult situation. Unfortunately, this can be a slippery road leading to a choking morass of code and a maintenance nightmare. In general, if the denormalization cannot be maintained by a single trigger or a single stored procedure, I try to avoid it.

I will shortly be doing performance testing on indexed queries in SQL 2000 in place of denormalizations.

Rearchitect

This is like denormalization, only worse from a labor standpoint. If the schema truly can't support the process at the required performance level and all else fails, a reworking of the basic schema may be in order.

Satisfice, Don't Optimize

What is acceptable performance varies from situation to situation. If I can't achieve orders of magnitude improvement (10, 100, 1000 times faster), I am usually disappointed. But, that amount of improvement will not be needed in every situation.

I used to work for a very smart man who always said "Satisfice, don't optimize." Since we've been attempting optimization all along, maybe we should step back. Our goal was to make things faster, not necessarily as fast as they can possibly be, but just fast enough to make the users happy. While it pains me to let a query go at only 2 or 3 times faster, sometimes, that's good enough.

SQL 2000

My recent testing suggests that quite noticeable gains can be had by upgrading to SQL 2000. It may even alleviate the need for some of the above fixes.

Please Comment

For comments or questions about this article, write me at aaron@itdevpros.com. If you have other optimization suggestions, I'd love to hear about them.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved