Loading data into SQL Server databases is one of those tasks that DBAs perform routinely. When you load large amounts of data into a database, it is important to do that efficiently. Not all load processes are created equal. It is important to know how each method works so you can choose the most appropriate load routine for your data load situation.
When loading large amounts of data into SQL Server databases you need to consider how logging information affects performance. There is minimal and full logged operations. This white paper goes into how to minimize logging of your bulk insert process, and discusses the myths about transactions and minimally logged operations. In addition, it also talks about how a trace flag, which was introduced in SQL Server 2008, can be used to further control logging of indexed tables.
This paper discusses the following different approaches you can use to bulk load data:
· Integration Services Data Destinations
· BULK INSERT
· INSERT SELECT
· SELECT INTO
There is an explanation of each one of these load operations, and how they compare against each other. This provides you with a way to determine which bulk load method works good for your situation. There is also discussion on how you can improve loading data into HEAP storage and the how loading can be handled on tables with clustered indexes.
This paper clarifies the difference between the BATCH_SIZE and ROWS_PER_BATCH options. The BATCH_SIZE you select can greatly affect how long your load operation might take. It is best to test a number of situations to determine the optimize BATCH_SIZE setting for your situation.
To help you determine what kind of load operation should be performed a number of different situations are analyzed by this paper. These different situations will provide you with guidance in determining the best approach for loading data. To further help with the decision making process, a convenient decision tree guides you through yes and no answers to can quickly help you determine the most appropriate method to use for a given situation.
A deep dive into the BULK LOAD process is examined. This section of the paper looks into parallelism and partitioning options. Knowing how to use parallelism and partitioning can greatly optimize your processes. Also discussed are waits statistics and performance counters. This discussion helps to identify how you should monitor your load processes to identify potential performance bottlenecks. By reviewing the different wait types and understanding, the different performance counters will provide you with the information you need to tweak your load processing to optimize the load performance. Additionally, how you configure your I/O subsystem and the file placement of your database and raw data files can affect performance. This paper talked about how to optimize your I/O subsystem.
There are many things to think about when you bulk load large amounts of data. By looking at different aspects of your bulk load processes, and ensuring your environment is configured correctly, and the right loading method is used can make or break your large data load operations. You need to make sure you understand all the different options and/or configuration settings you can use, especially if you are loading millions or billions of records at a time.