XML and SQL 2000 (Part 8)

Tuesday Aug 12th 2003 by Marcin Policht
Share:

Newer versions of SQLXML, in addition to the features previously covered in this series, provide another type of functionality--accessible through scripting--that allows bulk loading of XML data into SQL databases. Part 8 of 'XML and SQL 2000' gives an overview of the bulk load capabilities of the COM object included with SQLXML version 3.0.

So far, this series has concentrated on configuration options available via the graphical interface of IIS Virtual Directory Management for SQL Server. However, newer versions of SQLXML, in addition to the features we have already covered, provide another type of functionality, accessible through scripting, that allows bulk loading of XML data into SQL databases. Such functionality will be the topic of this article. For this, you should install the most recent release of SQLXML (version 3.0 currently at Service Pack 1 level, available for download from the Microsoft Web Site).

Mechanisms for modifying SQL databases using XML formatted data, such as the previously described updategrams, are not suitable for large amounts of data, because they require the entire XML document to be loaded into memory prior to initiating the insert operation. Bulk loading of XML data (equivalent to bulk loading using standard SQL server methods as Bulk Copy Program or BULK INSERT) provides a more efficient method of dealing with such situations.

XML-based bulk load operation is implemented as the XML Bulk Load COM component. You can take advantage of this component through use of programming (including .NET applications) or scripting (e.g. via VBScript, which will be the method presented here). The XML Bulk Load component uses data in XML format stored in an XML document (or an XML fragment) and an annotated mapped schema in XDR (XML Data Reduced) or XSD (XML Schema Definition) format (which provides XML data description and verification).

As you might recall from our earlier discussions, XDR schema has been supported (as the de-facto standard) since the release of SQL Server 2000 (and accompanying SQLXML 1.0). Once the XML Schema Definition standard had been ratified (and recommended) by the World Wide Web Consortium, Microsoft included it in subsequent releases of SQLXML (2.0 and 3.0). If you intend to work with XSD schemas, you can convert XDR schema examples presented in this article to XSD format using CVTSCHEMA.EXE included with SQLXML 3.0 (you can find this file in Program Files\SQLXML 3.0\bin folder).

To better understand the bulk load mechanism, let's take a look at a fairly simple sample of code that inserts XML-formatted data into the Shippers table of the Northwind database.

Set oXMLBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
oXMLBulkLoad.ConnectionString = "PROVIDER=SQLOLEDB;SERVER=YourSQLServer;" & _
				"DATABASE=Northwind;INTEGRATED SECURITY=sspi;"
oXMLBulkLoad.Execute "C:\XMLData\Shippers.xdr", "C:\XMLData\Shippers.xml"
Set oXMLBulkLoad = Nothing

As you can see, first we instantiate an object (in our example, called oXMLBulkLoad) of type SQLXMLBulkLoad - this class is represented in the HKEY_CLASSES_ROOT portion of the registry by SQLXMLBulkLoad.SQLXMLBulkLoad entry (referred to as ProgID), hence this is the parameter used by the CreateObject method. Next, we specify the connection string property of the object we just created. The connection string contains the provider name (SQLOLEDB), server name (make sure you replace the value "YourSQLServer" with the name of your SQL server), database name (Northwind in this case) and type of authentication used for the connection (Windows integrated). The actual bulk insert takes place when the Execute method of the oXMLBulkLoad object is invoked. The method takes two parameters - full paths of the files containing a schema - in XDR or XSD format - and corresponding XML document, respectively. If you save the script as BulkLoadShippers.vbs, you could execute it by running

cscript BulkLoadShippers.vbs

at the Command Prompt or simply double-clicking the file from Windows Explorer.

However, before you do this, you need to ensure that you have properly formatted schema and XML-document files, available in the locations specified in the script. Our XML document will have the following format:

<?xml version="1.0" encoding="utf-8" ?> 
  <Shippers>
    <Shipper ShipperID="4" CompanyName="Speedy Gonzalez" Phone="(503) 555-9934" /> 
    <Shipper ShipperID="5" CompanyName="The RoadRunner" Phone="(503) 555-3456" /> 
  </Shippers>

which corresponds to the following XDR schema:


<Schema xmlns="urn:schemas-microsoft-com:xml-data"
	xmlns:dt="urn:schemas-microsoft-com:datatypes"
	xmlns:sql="urn:schemas-microsoft-com:xml-sql">

	<ElementType name="Shipper" sql:relation="Shippers">
		<AttributeType name="ShipperID" required="yes" dt:type="int"/>
		<AttributeType name="CompanyName" required="yes" dt:type="string"/>
		<AttributeType name="Phone" required="no" dt:type="string"/>

		<attribute type="ShipperID"/>
		<attribute type="CompanyName"/>
		<attribute type="Phone"/>
	</ElementType>
</Schema>

Our XDR schema is rather straightforward and can be created without major effort, but constructing more elaborate XDR schemas tends to be a mundane and time consuming task (except for very simple data structures). Fortunately, you can simplify this process by using the XML View Mapper utility, freely downloadable from the Microsoft Web site.

After the installation, the utility is available from All Programs -> Microsoft SQL Server XML Tools menu. The first time you launch it, you will be prompted to create a new project. Once you specify its name and location, you will be presented with the XML View Mapper interface. The interface is divided into two main windowpanes - Map Editor with three subpanes on the left and Project Editor on the right. From here, you can generate XDR schemas based on SQL database tables and add annotations to them. Start by clicking on the leftmost subpane of Map Editor (labeled "Click to import an SQL schema") and fill out entries in the Data Link Properties dialog box that define a connection to the target server and database. Clicking on the OK button in this initial dialog box, will bring up the next one, titled New Database Tables, from which you can select tables and views for which you intend to create mapping schema. This will bring back the Map Editor window, this time with the tables you selected showing in the leftmost subpane. Next, select the "Generate XDR module..." option from the Tools -> Utilities menu. This will automatically generate an item under XDR node in the Project Editor, corresponding to the tables displayed in the Map Editor leftmost subpane. To generate a mapping schema for this XDR module, drag it to the rightmost subpane of the Map Editor. The mapping will be displayed as a new item under Map Modules node in the Project Editor. You can export XDR schema using the "Export XDR Schema" option from the Tools menu.

The steps described above introduced you to the most basic features of XML Map Viewer. Note, though, that the tool is much more powerful. For example, you can generate XDR mapping schemas representing multi-table relationships (by dragging and dropping respective columns between SQL Module and XDR Module subpanes of the Map Editor) or run tests by submitting XPath queries against the mappings. You can also generate XDR schema from an XML document or convert schema from DTD (Document Type Definition - schema definition mechanism used in early versions of XML) to XDR format.

Now that we are familiar with simpler ways of generating XDR schema, let's go back to our original script and review the different options available when bulk importing XML data into SQL Server databases. The SQLXMLBulkLoad object we created has a number of properties that can be used to modify the default bulk load behavior. Among the most relevant are:

  • CheckConstraints - when set to False, prevents check for constraints during the data load. With large amounts of data, this might significantly speed up the load process.

  • ErrorLogFile - when set to a valid file system path, it causes creation of an XML-based error log that can be analyzed to determine reason for bulk load failures.

  • ForceTableLock - when set to True, locks the entire table for the duration of the bulk insert, which typically speeds up the load process.

  • KeepIdentity - when set to False, SQL Server will create identity value during data import. Otherwise (also by default), values specified in the XML document, will be used for this purpose.

  • KeepNulls - when set to True, forces SQL server to insert a NULL value for unmapped columns (ones for which values are not included in XML document), even if default, non-NULL is defined for them.

  • SchemaGen - when set to True, causes creation of tables (if they do not already exist) during the import process. You can also drop existing tables by setting SGDropTables property to True.

To take advantage of any of these properties, you need to set its value (typically to either True or False) prior to invoking the Execute method). For example, if you want to ignore check for constraints during bulk loading, you would use the following code:

Set oXMLBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
oXMLBulkLoad.ConnectionString = "PROVIDER=SQLOLEDB;SERVER=YourSQLServer;" & _
				"DATABASE=Northwind;INTEGRATED SECURITY=sspi;"
oXMLBulkLoad.CheckConstraints = False
oXMLBulkLoad.Execute "C:\XMLData\Shippers.xdr", "C:\XMLData\Shippers.xml"
Set oXMLBulkLoad = Nothing

In this article, I presented an overview of bulk load capabilities of COM object included with the SQLXML version 3.0. In the next article of this series, we will look at another method of modifying SQL Server databases, called diffgrams, which has been introduced after the release of SQL Server 2000.

» See All Articles by Columnist Marcin Policht

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved