XML and SQL 2000 (Part 1)

Friday Apr 25th 2003 by Marcin Policht
Share:

SQL Server 2000 includes built-in support for XML. Even though some limitations of this implementation exist, conversion between relational and XML-formatted data is greatly simplified. Learn how to retrieve data as an XML document using the FOR XML clause of the SELECT statement.

One of the most common problems with managing data is its versatility. This becomes evident when exchanging data between multiple data sources and results mainly from incompatibilities in data representation and format. The impact of this type of problem affects most companies internally, but is even more acute in the business-to-business or business to consumer scenarios. Early solutions to this problem (such as Electronic Data Interchange) have never become popular due to their high cost and limited flexibility. A breakthrough took the form of Extensible Markup Language, technology derived from Standardized General Markup Language, developed under the auspices of the World Wide Web Consortium (www.w3.org). XML is frequently compared with an SGML derivative - HTML. While HTML defines the structure of data, XML's purpose is providing a description of the data structure. Just as with HTML, XML uses tags to accomplish this goal. However, unlike HTML, there are no fixed set of XML tags - they can be named in whatever way is best suitable for conveying the information about the data (hence the eXtensible in the "XML").

SQL Server 2000 includes built-in support for XML. Even though some limitations of this implementation exist, conversion between relational and XML-formatted data is greatly simplified. This includes the following features:

  • retrieving relational data in XML format with FOR XML clause of the SELECT statement,
  • inserting XML data into the database with OpenXML rowset provider,
  • execute T-SQL statements queries against SQL Server via HTTP using IIS Virtual Directory Management.

Retrieving data as XML document using FOR XML clause of the SELECT statement

In order to retrieve data as an XML document, you need to modify a standard format of the SELECT statement. The level of modification depends on the options you will use with the FOR XML clause. In general, the clause takes the following format:

FOR XML mode [, XMLDATA] [, ELEMENTS] [, BINARY BASE 64]

Where

  • Mode can be RAW, AUTO, or EXPLICIT and determines the format of the resulting XML document. In order to understand what this means, we need to briefly describe these formats. In general, there are two ways of representing data returned as an XML document - using sub elements and attributes. For example, the following XML document consists of four sub elements:
    <InventoryItem>
       <Name>"Very Small Widget"</Name>
       <Type>"Hardware"</Type>
       <QuantityInStock>20</QuantityInStock>
       <Price>0.49</Price>
    </InventoryItem>
    
    However, it is also possible to represent the same data in the form of attributes of a single element, which would produce the following XML document:
    <InventoryItem Name="Very Small Widget" Type="Hardware" QuantityInStock=20 Price=0.49 />
    
    • The RAW keyword returns an XML document by presenting each row as a single element with columns as attributes. Each element is named simply "row." This means that the following query (against Shippers table in Northwind database):
      SELECT ShipperID, CompanyName, Phone
      FROM Shippers
      FOR XML RAW
      

      would return the following result:

      <row ShipperID="1" CompanyName="Speedy Express" Phone="(503) 555-9831"/>
      <row ShipperID="2" CompanyName="United Package" Phone="(503) 555-3199"/>
      <row ShipperID="3" CompanyName="Federal Shipping" Phone="(503) 555-9931"/>
      

      The format of the result is the same, regardless of the number of tables referenced in the SELECT statement.

    • The result returned when AUTO keyword is used depends on whether one or more tables are being used in the query. With one table, the only effect of using AUTO keyword is the fact that the individual elements are no longer labeled "row" - instead they are named after the table, which in our example would return the following:
      <Shippers ShipperID="1" CompanyName="Speedy Express" Phone="(503) 555-9831"/>
      <Shippers ShipperID="2" CompanyName="United Package" Phone="(503) 555-3199"/>
      <Shippers ShipperID="3" CompanyName="Federal Shipping" Phone="(503) 555-9931"/>
      

      However, if the SELECT statement includes multiple tables, the results would return one element per table. Elements corresponding to rows of tables referenced further in the SELECT statement become sub elements of the top level elements (the level of nesting would depend on the order in which these tables appear in the SELECT statement). For example, the following query:

      SELECT ShipperID, CompanyName, OrderID
      FROM Shippers
      LEFT JOIN Orders
      ON Shippers.ShipperID = Orders.ShipVia
      FOR XML AUTO
      

      Would return the following results (with majority of data omitted for clarity):

      <Shippers ShipperID="1" CompanyName="Speedy Express">
          <Orders OrderID="10249"/>
      ...
          <Orders OrderID="11071"/>
      </Shippers>
      <Shippers ShipperID="2" CompanyName="United Package">
          <Orders Order="10440"/>
      ...
          <Orders OrderID="11077"/>
      </Shippers>
      <Shippers ShipperID="3" CompanyName="Federal Shipping">
          <Orders OrderID="10248"/>
      ...
          <Orders OrderID="11061"/>
      </Shippers>
      
    • Data returned by the SELECT statement with the AUTO keyword can be further modified by using the ELEMENT keyword, which causes a conversion of all attributes into sub elements. By adding the ELEMENTS keyword (following a comma) at the end of the FOR XML AUTO clause in our previous example, we would get the results in the following format:
      <Shippers>
        <ShipperID>1</ShipperID>
        <CompanyName>Speedy Express</CompanyName>
          <Orders>
            <OrderID>10249</OrderID>
      ...
            <OrderID>11071</OrderID>
          </Orders>
      </Shippers>
      <Shippers>
        <ShipperID>2</ShipperID>
        <CompanyName>United Package</CompanyName>
          <Orders>
            <OrderID>10440</OrderID>
      ...
            <OrderID>11077</OrderID>
          </Orders>
      </Shippers>
      <Shippers>
        <ShipperID>3</ShipperID>
        <CompanyName>Federal Shipping</CompanyName>
          <Orders>
            <OrderID>10248</OrderID>
      ...
            <OrderID>10255</OrderID>
          </Orders>
      </Shippers>
      
    • The EXPLICIT keyword gives you the most flexibility, but at the same time, it is the most complex. The SELECT statement itself is used to form the hierarchy of XML elements and attributes. In the most basic form, the first column (always an integer value labeled as Tag) specified in the statement indicates the current XML element. The second column (labeled as Parent) indicates which element is the parent of the current element (and can be set to 0 or NULL if the current element is the top one). Both values are referenced in the subsequent columns, (actual columns from the target table), to determine their relative placement. These columns are converted to XML elements or attributes using the following notation:
      Element!Tag!Attribute!Directive
      

      Where Element designates the arbitrarily chosen name of the XML element and Tag is the number of the element (as explained above). The meaning of Attribute depends on the value of Directive. If Directive is omitted, then Attribute is used as the name of the XML attribute: if Directive is set to the value element, then Attribute is used as the name of the XML element (Directive can have a number of other values - for more information refer to the SQL Books Online). This should become clearer with the help of a few examples. The following SELECT statement:

      SELECT  1 		AS Tag,
      	NULL		AS Parent,
      	ShipperID 	AS [ShippersElement!1!ShipperIDAttribute],
      	CompanyName	AS [ShippersElement!1!CompanyNameAttribute],
      	Phone		AS [ShippersElement!1!PhoneAttribute]
      FROM Shippers
      FOR XML EXPLICIT
      

      Produces the output in the form:

      <ShippersElement ShipperIDAttribute="1" CompanyNameAttribute="Speedy Express"
       PhoneAttribute="(503) 555-9831"/>
      <ShippersElement ShipperIDAttribute="2" CompanyNameAttribute="United Package"
       PhoneAttribute="(503) 555-3199"/>
      <ShippersElement ShipperIDAttribute="3" CompanyNameAttribute="Federal Shipping" 
      PhoneAttribute="(503) 555-9931"/>
      

      We could easily change the attributes in the output into elements by including the element directive:

      SELECT  1 		AS Tag,
      	NULL		AS Parent,
      	ShipperID 	AS [ShippersElement!1!ShipperIDAttribute!element],
      	CompanyName	AS [ShippersElement!1!CompanyNameAttribute!element],
      	Phone		AS [ShippersElement!1!PhoneAttribute!element]
      FROM Shippers
      FOR XML EXPLICIT
      

      Which would give us:

      <ShippersElement>
         <ShipperIDAttribute>1</ShipperIDAttribute>
         <CompanyNameAttribute>Speedy Express</CompanyNameAttribute>
         <PhoneAttribute>(503)555-9831</PhoneAttribute>
      </ShippersElement>
      <ShippersElement>
         <ShipperIDAttribute>2</ShipperIDAttribute>
         <CompanyNameAttribute>United Package</CompanyNameAttribute>
         <PhoneAttribute>(503) 555-3199</PhoneAttribute>
      </ShippersElement>
      <ShippersElement>
         <ShipperIDAttribute>3</ShipperIDAttribute>
         <CompanyNameAttribute>Federal Shipping</CompanyNameAttribute>
         <PhoneAttribute>(503) 555-9931</PhoneAttribute>
      </ShippersElement>
      

      You can get more creative by using multiple SELECT statements, combined together with the UNION ALL clause (note that this also requires an appropriately formed ORDER BY clause). For example:

      SELECT  1 		AS Tag,
      	NULL		AS Parent,
      	ShipperID 	AS [Shippers!1!ShipperID],
      	NULL		AS [Shippers!2!CompanyName]
      FROM Shippers
      UNION ALL
      SELECT 	2,
      	1,
      	ShipperID,
      	CompanyName
      FROM Shippers
      ORDER BY [Shippers!1!ShipperID],[Shippers!2!CompanyName]
      FOR XML EXPLICIT
      

      Which produces the following results:

      <Shippers ShipperID="1">
         <Shippers CompanyName="Speedy Express"/>
      </Shippers>
      <Shippers ShipperID="2">
         <Shippers CompanyName="United Package"/>
      </Shippers>
      <Shippers ShipperID="3">
         <Shippers CompanyName="Federal Shipping"/>
      </Shippers>
      
  • " XMLDATA specifies that, in addition to the XML document, its schema will be returned (at the beginning of the output). XML schema provides information about the structure of data contained in the XML document. The schema, in such case, reflects the definition of tables referenced in the SELECT statement. For example, adding it to one of our earlier examples:
    SELECT ShipperID, CompanyName, Phone
    FROM Shippers
    FOR XML RAW, XMLDATA
    

    Would produce, in addition to already presented results, the following:

    <Schema name="Schema4" xmlns="urn:schemas-microsoft-com:xml-data" 
    xmlns:dt="urn:schemas-microsoft-com:datatypes">
      <ElementType name="row" content="empty" model="closed">
        <AttributeType name="ShipperID" dt:type="i4"/>
        <AttributeType name="CompanyName" dt:type="string"/>
        <AttributeType name="Phone" dt:type="string"/>
        <attribute type="ShipperID"/>
        <attribute type="CompanyName"/>
        <attribute type="Phone"/>
      </ElementType>
    </Schema>
    
  • " Finally, BINARY BASE 64 is used to return the binary data in base64 encoded format, (obligatory when retrieving binary data in combination with RAW and EXPLICIT mode).

Information presented in this article should familiarize you with extracting data from a relational database in XML format. In the next article of this series I will describe the process of using OpenXML to create rowset based views of XML data, which can be subsequently manipulated as if they contained relational data.

» See All Articles by Columnist Marcin Policht

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved