Are your index statistics up to date? The SQL Server database engine uses statistics to determine which execution plan to use when a query is compiled. Index statistics are used to help the SQL Server engine determine the cardinality of different column values in a SQL Server table. If you want to give SQL Server a fighting chance at picking an optimal execution plan then you need to make sure you maintain your index statistics. This is especially important for table that contains lots of rows. If you have out of date statistics then SQL Server may pick a less than optimal execution plan. This article will provide you with an overview of what statistics are and how to deal with out of date statistics.
What are SQL Server index statistics?
Index statistics contain information about the distribution of index key values. By distribution, I mean the number of rows associated with each key value. SQL Server uses this information to determine what kind of execution plan to use when processing a query. As you insert, update, or delete rows in a table, the statistics on the updated table become out of date. They become out of date because SQL Server doesn't update the statistics every time you change a row in a table.
For SQL Server to obtain the distribution of key values, it needs to sample the data. SQL Server does this sampling a couple of different ways. It can either scan every row in the table to accurately calculate the statistics, or it can sample a percentage or number of rows to determine the distribution of key values. You can imagine how expensive it might be to create statistics by reading every row for a very large table containing billions of rows.
When are index statistics updated?
It is important to understand when SQL Server updates your index statistics. The AUTO_UPDATE_STATISTICS database setting controls when statistics are automatically updated. By default this database setting is set to true, which means your index statistics will be automatically updated.
Once statistics have been created, SQL Server then determines when to update those statistics based on how out-of-date the statistics might be. SQL Server identifies out of date statistics based on the number of inserts, updates, and deletes that have occurred since the last time statistics were updated, and then recreates the statistics based on a threshold. The threshold is relative to the number of records in the table. This means as you add and update records in a database table the index statistics slowly get out of date, until SQL Server determines you have added and/or updated enough records and then it recreates the statistics. Therefore with large tables with millions or billions of records, SQL Server may require that a few hundred thousand or millions of records need to be added, updated, and/or deleted before the statistics are automatically updated. With tables that contain millions of records, this means a lot of records need to be updated, inserted and/or deleted before SQL Server will update statistics automatically.
The problem with out of date statistics
The problem with having out of date statistics is that SQL Server might not know all the different possible values there might be for an indexed columns, and therefore the database engine will pick a sub-optimal execution plan. When you have tables with billions of records and you add millions of records daily, there may not be any statistics associated with the recently added records. Since there are no statistics associated with these new records added, SQL Server may not be unable to identify efficient execution plans when you try to select records based on key values associated with newly added records. Therefore it is important to make sure your statistics are up to date if you want to give the SQL Server engine current information so it can pick an optimal plan based on the data currently in the tables being queried.
How to determine if your statistics are out of date
SQL Server keeps track of the last time statistics were updated. It even keeps track of the sampling method used when it last created statistics. All this information is there for you to explore to determine how out of date your statistics might be.
There are a number of different ways to view when your statistics were last updated. The first method is to use SQL Server Management Studio. To use this method expand your database in Management Studio, expand the table for which you want to view index statistics, expand the "Statistics" item, and then right click on the index for which you want to view statistics, and then select the "Properties" item. This will display a window similar to this:
Near the bottom of the right pane of this window you can see these statistics were updated on 1/11/2011 at 4:48.32 PM.
Another method to display when indexes were last updated is to use the STATS_DATE function. You can retrieve the statistics date by providing the object_id, and index_id to this function for the index in which you want to view the statistics date. Here is a sample script that uses this function to display index statistic date for all user defined indexes:
SELECT schema_name(o.schema_id) AS SchemaName
, object_name(si.object_id) AS TableName
, si.name AS IndexName
, STATS_DATE(i.object_id, i.index_id) AS StatDate
FROM sys.indexes si
JOIN sys.objects o ON si.object_id = o.object_id
JOIN sys.indexes i ON i.object_id = si.object_id AND i.index_id = si.index_id
WHERE o.type <> 'S' -- exclude system objects
AND STATS_DATE(i.object_id, i.index_id) IS NOT NULL;
You need to use the "StatDate" column in the output from this script column to determine the last time the statistics were updated.
How to manually update statistics
Since SQL server does not do that good a job of updating statistics automatically for very large tables, you need to manually update your statistics if you want to get optimal execution plans. One of the methods you can use to update your statistics is to drop the index that needs updated statistics and re-create them. This will recreate your statistics but this is not the most efficient way to update statistics.
Another method you can use is to execute the “sp_updatestats” system stored procedure. This stored procedure will update all the statistics in your database that require being updated. Behind the scenes, this process uses the UPDATE STATISTICS command to actually update the statistics. This stored procedure doesn't have a lot of options, although it does allow you to specify RESAMPLE, which tells SQL Server to use the sampling method that was used last time it calculates statistics. Without specifying the RESAMPLE option, SQL Server automatically determine how much of the data to sample in order to create the index statistics. The problem with this method is you don't have control over whether or not statistics are recreated since it determines which indexes are updated automatically. I don't know about you, but I want a little more control over creating my statistics.
If you want greater control over updating your statistics you can use the UPDATE STATISTICS stored procedure directly. This stored procedure can be used to update statistics on a single index, or all indexes on a table. By using this stored procedure you can make sure the out-of-date statistics on your large indexes, or any other index for that matter, are up to date. This stored procedure allows you to identify the sampling method that will be used when creating statistics. The sampling method determines how accurate the index statistics will be, since the more records sampled to more accurate the index statistics. There are three different sampling methods you can use: FULLSCAN, SAMPLE, and RESAMPLE. The FULLSCAN sampling method will read every row in a table to update the statistics. This might take a long time for very large tables. By using the SAMPLE option you can specify the number of rows to be sampled by providing a row count, or percentage of rows. The last sampling method is RESAMPLE, when you select this method of sampling, SQL Server takes the last sampling rate used for creating the index statistics and uses that sampling method to create the new statistics. If none of these options are specified, then SQL Server will calculate the number of records it samples to calculate statistics.
For more information about both of these stored procedures review Books Online.
Help the SQL engine by updating statistics
It is amazing how often I find that DBAs don't manage their statistics. In a lot of cases if tables have a small record count, then the automatic rules SQL Server has for updating statistics work well at keeping the statistics up to date. But when a database has tables with millions or billions of records, sometimes statistics are not updated often enough, and then you find you have lots of sub-optimal execution plans. Help out your SQL Server engine in optimizing your queries by building a process that makes sure your statistics are updated periodically.