If your application has fields where users key in long free form text strings, then you probably have been storing the data entered in a text column within SQL Server. Storing data in text columns was your only option in SQL Server 2000 for any field that might exceed 8000 bytes worth of data. Working, updating and searching text columns was not the easiest thing to do in SQL Server 2000. Well Microsoft has come to the rescue for managing large text fields in SQL Server 2005, by creating a few new large value data types. This article will explore these new data types.
What Are New Large Value Data Types?
A large value data type is a one that will hold a value that is larger than the 8k row size. SQL Server 2005 provides three new data types for storing long columns, which are: VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX).
These new data types are easier to use than the older large object data types (TEXT, NTEXT and IMAGE). SQL Server 2005 also supports defining variables to hold these large value data types. In addition, most of the string functions available in SQL Server 2005 support working with these new large value data types, which is a great improvement over the methods used to manipulate TEXT and NTEXT in SQL Server 2000. The maximum length for a large value data type is 2^31-1 bytes. This should allow plenty of space for all those free form data entry screens and web pages.
How are Large Value Data Types Stored?
The column value for a large value data type might be stored in a page with data associated with the rest of the columns of a row, or might be stored in separate data page. You can control how SQL Server handles storage of your large value data columns by using the large value types out of row option of the sp_tableoption system stored procedure. When the large value types out for row option is set to OFF, SQL Server tries to store large value data types in the row with the rest of the columns. A large value data type will be stored in the row, if the row size does not exceed 8,000 bytes. When this table option is set to ON, the database engine doesnt even try to store large value data types in the row, but instead just stores them in additional data pages. If the large value data types are not stored with the rest of the columns of a row, then a 16 byte pointer is stored in the row. This 16-byte pointer references the root node of a tree of internal pointers. These pointers are used to map to the pages that contains the data the makes up the text value of the large value data types. The advantage of storing large value data types in a row with the rest of the columns is so SQL Server can return an entire row with a single I/O. When large value data types are stored in additional pages, the database engine will need to perform multiple I/Os in order to retrieve a single row of data. Depending on how an application handles your data record you may see some performance benefits if it can store your large value data types with the rest of your columns in a row. The basic rule of thumb is if the bulk of the SQL Server statements do not return large value data type columns, then you should store the large value data out of row. Doing this will allow for more rows to be stored in a given data page, and therefore more rows will be read per I/O.
Working With Large Value Data Types
Large value data types are much easier to work with compared to working with the older large object data types. Updating and performing string manipulation operations use the same methods as you would with any other character data type. There are some things to consider when you are working with large value data type fields in order to get the results you expect. Let me go through some examples of working with large value data types.
In this first example Ill run the following script. This script will create a simple table that contains two columns, populate it with two rows, and then display the length of the LargeString column for each row.
set nocount on create table demo (ID int identity, LargeString varchar(max)) insert into demo (LargeString) values ('abc' + replicate('d', 100000) + 'efg') insert into demo (LargeString) values ( 'abc' + replicate(cast('d' as varchar(max)), 10000) + 'efg') select ID, len(LargeString) Length from demo drop table demo
When I run this code I get the following results:
ID Length ----------- -------------------- 1 8000 2 10006
As you can see, the length of the LargeString column is not the same, even though the INSERT statements are some what similar. So why do these records have different lengths for the LargeString Column? The problem is related to the REPLICATE function. The REPLICATE function will truncate values to 8,000 bytes unless the character expression pass is either a varchar(max), or an nvarchar(max) data type. When I specified just d in the first insert statement it defaulted to a varchar(8000) field. So by casting the d as a varchar(max) field, in the second INSERT statement the REPLICATE function was able to support building a string of 10,006 bytes. So when using functions to muniplate large value data types make sure you explicitely specify each parameter that is a large value data type.
Updating a large value data type column is considerably easier then using the UPDATETEXT statement to update a text data type column. The UPDATE statement in SQL Server 2005 now supports a .WRITE() operation. The .WRITE() operation can be used to update a portion of the large value data type column. The .WRITE() operation has the following syntax:
.WRITE ( expression, @Offset , @Length )
Where expression is the character string you want to place into the large value data type column. The expression will replace the number of characters specified by the @Length parameter starting at the position specified by the @Offset parameter.
Below is an example of using the .WRITE() operation for inserting some characters into a varchar(max) column :
set nocount on create table LargeValue (id int identity, BigString varchar(max)) insert into LargeValue(BigString) values (replicate(cast('y' as varchar(max)), 10000) + 'abc') select reverse(BigString) 'Reverse of BigString' from LargeValue Update LargeValue set BigString.write('xxx',10001,1) select reverse(BigString) 'Reverse of BigString' from LargeValue drop table LargeValue
When I run this script on my SQL Server 2005 instance I get the following output:
Reverse of BigString ----------------------------------------------------------------------- cbayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy Reverse of BigString ----------------------------------------------------------------------- cxxxayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
The code above reverses the BigString varchar(max) column, and displays the column value before and after using the .WRITE() operation. Note that I replaced the letter b with the character string xxx at offset 10001.
Searching for character strings in text fields was not an easy task in SQL Server 2000, but when using varchar(max) field, searching in SQL Server 2005 is much easier. Below is an UPDATE statement that functionally does the same thing as my prior example. But instead of hard coding the offset to where the character b is located, this example searches the varchar(max) column for the character b, to determine the offset for the .WRITE() operation:
Update LargeValue set BigString.write('xxx',charindex('b',BigString)-1,1)
The support for new large value data types is a great addition to SQL Server 2005. If you are building a new application that needs to store a column that possibly might exceed 8000 bytes, you should consider using these new large value data types. It is much easier to write code against columns that are defined using one of the new large value data. If you want to simplify the code you write to handle long strings of text, you should start using large value data types instead of text data types when ever possible.