In the first article of the series, I explained the ways of presenting relational data in XML format using the FOR XML clause of the T-SQL SELECT statement. Now it is time to look into the reverse process of inserting XML formatted data into SQL databases using OpenXML function.
Inserting content from an XML document into an SQL database involves the following three steps:
- Running sp_xml_preparedocument stored procedure, which creates memory-resident representation of the XML document (in a form of tree structure) and returns an integer value. This value is used by both OpenXML and sp_xml_removedocument as the pointer to the same XML document. Internally, sp_xml_preparedocument stored procedure uses the MSXML2 parser (included in SQL 2000) to parse the XML document.
- Executing OpenXML function, which uses the memory-resident XML document created in the first step and converts it into the "relational" format. By incorporating OpenXML function into INSERT statement, you can insert this converted data into any SQL database (although note that what you do with this data is entirely up to you)
- Running sp_xml_removedocument stored procedure, which frees memory allocated to the XML document by sp_xml_preparedocument. The only purpose of this step is the clean up after the first two, nevertheless you should always keep it in mind, in order to prevent problems resulting from running out of memory.
Now that we know the steps, let's take a closer look into each one of them:
- sp_xml_preparedocument has the following format:
sp_xml_preparedocument idoc OUTPUT [,xmltext] [,xpath_namespaces]Where the arguments are:
- idoc - output argument, which contains the integer value identifying memory-resident representation of XML document
- xmltext - the XML document to be parsed (which can be of type char, nchar, varchar, nvarchar, text, or ntext)
- xpath_namespaces - names of namespaces to be used by OpenXML function when working with XML document. This is required if the XML document contains user-defined namespaces. XML namespaces are typically used to set boundaries separating XML components (attributes and elements) created by different developers. This prevents accidental name conflicts between identically named attributes (or elements) in two namespaces. Namespaces are identified by Universal Resource Identifiers (URIs), typically formatted the same way as Universal Resource Locators (URLs) that you know from browsing the Internet. Some namespaces are well known and are automatically recognized by MSXML parser; however, if the XML document contains any custom ones, they need to be included as the xpath_namespaces parameter.
- OpenXML function has the following format:
OpenXML(idoc, rowpattern [,flags]) WITH (SchemaDeclaration | TableName)]
Where the arguments are:
- idoc - integer value returned by the sp_xml_preparedocument stored procedure and pointing to the memory-resident XML document
- rowpattern - in essence, this parameter determines which elements and attributes of an XML document should be converted to rowset format. This is done by specifying XPath formatted string identifying the target nodes to be processed as rows. In order to understand it better, the following section provides explanation of the basic principles of XPath.
XPath is an abbreviation of the term XML Path Language, which designates a language that is used for the processing of XML documents. XPath looks at XML elements as nodes of a tree (and in this aspect, it reflects the way we look at files and directories in the file system of a drive). Using its own set of symbols, XPath allows you to traverse the tree structure, consisting of XML elements, from the root to any element or attribute, as well as between any two elements or attributes. The root of this tree is referred to as the "/". To access an element, you need to specify its path, which consists of all elements traversed on the way to it. When refering to an element, you simply use its name; references to an attribute are distinguished by adding the "@" symbol in front of it. Paths can be absolute (starting from the root - which means that the path has "/" as its first character) or relative (from one non-root element to another). To make it easier to understand, let's look at some examples. First, consider the following sample XML document containing listing of Inventory items:
<Inventory> <Product ID="00001"> <Name>"Very Small Widget"</Name> <Type>"Hardware"</Type> <QuantityInStock>20</QuantityInStock> <Price Category="Wholesale">0.49</Price> <Price Category="Retail">0.99</Price> </Product> <Product ID="00002"> <Name>"Not So Small Widget"</Name> <Type>"Hardware"</Type> <QuantityInStock>200</QuantityInStock> <Price Category="Wholesale">0.99</Price> <Price Category="Retail">1.49</Price> </Product> </Inventory>
In this case, the Product elements can be reached by:
Note that this XPath expression refers to all Product elements. If you wanted to retrieve all Product elements and their sub-elements, you would use the following notation:
would refer to all elements in the entire XML tree. The notation
would reference all of the ID attributes. If the @ID attribute were also used in elements other than the Product, you could reference all of these attributes by using the notation:
You can also traverse the tree in the opposite direction (towards the root) by using the ".." symbol, which designates the parent of the current element.
All of the XPath examples given so far return all of the elements or attributes, which occupy specific locations within the tree. If you want to be more specific and select them based on their value, you can use predicates. For example, to reference all products of the Hardware type, you would use the following :
To find a Product with the attribute ID equal to "00001", you would specify:
The remaining parameters of OpenXML function are:
- flags - integer value, which determines whether the returned rowsets should contain attributes (value of 0 or 1), elements (value of 2), or both attributes and elements (value of 3)
- SchemaDeclaration - specifies the schema (along with its properties) used to define the structure of the rowset that will be returned. You can think of XML schema as an equivalent of a SQL table definition.
- TableName - instead of specifying the schema to define the structure of the rowset, you can alternatively provide name of target table
- sp_xml_removedocument has the following format:
Where its only argument (idoc) is the integer value returned by the sp_xml_preparedocument stored procedure which designates memory-resident XML tree to be removed.
Once we have a better understanding of all the functions and stored procedures involved, let's take a look at example of inserting content of XML document into a SQL database.
DECLARE @idoc int DECLARE @doc nvarchar(300) SET @doc = '<?xml version="1.0" ?> <ROOT> <Shippers ShipperID="4" CompanyName="The RoadRunner" Phone="(503) 666-9831"/> <Shippers ShipperID="5" CompanyName="Speedy Gonzalez" Phone="(503) 666-3199"/> </ROOT>' EXEC sp_xml_preparedocument @idoc OUTPUT, @doc INSERT Northwind..Shippers SELECT * FROM OPENXML (@idoc, N'/ROOT/Shippers') WITH Northwind..Shippers EXEC sp_xml_removedocument @idoc
First, we declare two variables. The first one (@idoc) will hold the integer value returned by sp_xml_preparedocument stored procedure and will point to the memory-resident XML document. The second one (@doc) contains the XML document itself. In this case, I decided to add two more entries to the Shippers table in the Northwind database. Next, we execute the sp_xml_preparedocument stored procedure, providing @doc as its input argument and receiving the pointer to residing in memory XML document stored in @idoc. We include SELECT * FROM OPENXML as part of INSERT statement, which populates the Shippers table with the content of XML document. OPENXML statement uses Shippers table name to define structure of the rowset and the XPath set to '/ROOT/Shippers' to determine that the Shippers elements of the XML document should be returned. Since the Flags parameter is not specified, it takes on its default value of 0, which causes arguments of the XML documents to be added to the Shippers table.
As you can see, once you get through the initial hurdles of understanding the syntax of OPENXML function and associated stored procedures, the process of inserting the XML formatted data into SQL 2000 databases is fairly straightforward. In the next article of this series, we will explore executing T-SQL statements against SQL Server via HTTP using IIS Virtual Directory Management.