The importance of XML-based technologies is steadily increasing and this trend is reflected by a variety of new features in this category introduced in SQL Server 2005 Integration Services. One of the most prominent ones is the XML Control Flow task, which provides assistance with processing XML-formatted documents. In this installment of our series, we will describe its properties and functionality. Note that in order to better understand each of the topics presented here, you should familiarize yourself with basic XML-related concepts. You can find their overview on Microsoft XML Development Center Web site (we also have discussed some of them in the context of SQL Server 2000).
In short, XML (an acronym for the term eXtensible Markup Language) serves as an implementation-independent markup language, that can be easily employed in cross-platform data exchanges. Among the most distinguishable characteristics of XML-formatted documents is their tree-based, self-documenting structure, consisting of multi-level nodes, representing primary syntactical XML components, such as elements, attributes, and values (with the header consisting of an optional XML declaration, which lists the XML version and character encoding type, followed by XML namespace specifications and stylesheet processing instructions). Formal definition of these components (including permitted data types) is accomplished through an associated XML schema. The most popular choice in this category is currently XML Schema Definition (XSD), which left behind earlier contenders, such as the oldest and widely supported (due to its inclusion in XML 1.0 standard), but lacking support for a number of newer features (such as XML namespaces or custom data types), Document Type Definition (DTD) or XML Data Reduced (XDR) schema (introduced by Microsoft in their early implementations of Microsoft XML Core Services).
Due to importance and omnipresent nature of XML, its support has been improved in SQL Server 2005 Integration Services (comparing with Data Transformation Services in earlier versions of SQL Server). Even though implementation of the XML Data Flow Destination is painfully absent (you have to emulate it through scripting), you can take advantage of built-in the XML Data Flow source, as well as versatile XML Control Flow task. To get familiar with the latter, launch Business Intelligence Development Studio and initiate a new Integration Services project. Drag XML Task from the Toolbox and drop it on the Control Flow tab of the Designer interface. Launch its Editor window by selecting the Edit... item from its context sensitive menu. The General area is divided into several sections, although their exact content depends largely on the value assigned to the OperationType property. Choosing one of its six possible options (which you can view in the drop-down list in the Input section and which we will be describing in more detail next) affects the range of the remaining properties that need to be configured in order to deliver desired functionality. All of the operations require two operands (where the first one is referred to as Source and contains an original XML file against which operations are carried out) with independently configurable types (in the majority of cases, both of them can be either entered directly in the XML Task Editor interface or stored in a file or a variable). We will start with two of them (and cover the remaining four in the next article of this series):
Validate - its purpose is to verify syntactical correctness of an XML document by comparing its content against the XML Schema Definition (XSD) or Document Type Definition (DTD), depending on the value assigned to the ValidationType property (XSD or DTD). This comes in handy in situations where you want to load output of XML Data Flow Source into a data store (such as a database table), but you are not entirely certain that it follows the required specifications. For the sake of simplicity, we will test it using Direct input for both the source and the second operand (which, in this case, is the schema definition), based on the examples included in A Quick Guide to XML Schema on the MSDN Web site (which gives you a fairly comprehensive introduction to DTD and XSD schemas). We will start with the ones listed at the beginning of the section "Elements and Attributes". Our sample XML document will look as follows:
<tns:employee xmlns:tns="http://example.org/employee/" tns:id="555-12-3434"> <tns:name>Monica</tns:name> <tns:hiredate>1997-12-02</tns:hiredate> <tns:salary>42000.00</tns:salary> </tns:employee>
Copy its content directly into the Source text box (using Source Editor) in the Input section of the XML Task Editor window, with SourceType set to Direct input. In the same window, set the ValidationType to XSD, assign Direct input value to the SecondOperandType property, and copy the following entries (representing XSD schema of the sample XML document) into the Second Operand text box:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.org/employee/"> <xsd:element name="employee"/> <xsd:element name="name"/> <xsd:element name="hiredate"/> <xsd:element name="salary"/> <xsd:attribute name="id"/> </xsd:schema>
Once you have the Source and SecondOperand properties configured, ensure that FailOnValidationFail is assigned True value (in order to detect whether validation has been successful). Execute the package, confirming whether our XML document conforms with the schema defined for it (which should be the case providing you copied their content correctly).
Similarly, you can validate an equivalent (practically identical, with the exception of the DOCTYPE declaration) XML document that was written according to DTD specifications:
<!DOCTYPE employee SYSTEM "employee.dtd"> <tns:employee xmlns:tns="http://example.org/employee/" tns:id="555-12-3434"> <tns:name>Monica</tns:name> <tns:hiredate>1997-12-02</tns:hiredate> <tns:salary>42000.00</tns:salary> </tns:employee>
Enter its content into the Source textbox in the Input section of the XML Task Editor window. Next change the ValidationType property value to DTD and note that this will alter a number of available entries. In particular, you will no longer have the option to specify the second operand and its type (this will be accompanied by a validation warning stating that DTD Validation will rely on the DTD file defined in the DOCTYPE line in the XML document). To address this issue and provide DTD content (as the second operand for validation), copy the following DTD structure:
<!-- employee.dtd --> <!ELEMENT employee (name, hiredate, salary)> <!ATTLIST employee id CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT hiredate (#PCDATA)> <!ELEMENT salary (#PCDATA)>
Create a text file (its location and name are arbitrary). Point Flat File Connection Manager to it, with the name matching the one assigned to the DTD file in the DOCTYPE declaration of the XML document (i.e. "employee.dtd" in our case). This is accomplished by right-clicking on the tabbed area of Connection Managers and selecting the New Flat File Connection... entry. In the resulting dialog box, name the newly created connection manager as indicated (employee.dtd) and point to the DTD file using the Browse... button next to the File name label. Once you confirm that appropriate entries appear in the Columns section, close the Editor window and execute the package to verify that our XML document conforms to the DTD specifications.
XSLT (an acronym derived from the term eXtensible Stylesheet Language Transformation) - utilizes an XSLT stylesheet in order to transform a source XML document into a new format. This is performed routinely in order to change its visual appearance by adding HTML-specific tags (effectively producing an HTML web page) or modify its content by adding, removing or reordering its elements. The format of XSLT documents follow XML specifications, typically contain a number XSL processing statements (that result in appropriate HTML tags being added into the generated output), and employ XPath-based mechanisms in order to identify a portion of the XML document to which each tag should be applied (or which simply should be copied to the output without any changes).
XPath (an abbreviation for the term XML Path Language) is a notation used for locating specific components within XML documents. By describing these components as nodes of a tree, XPath provides a way to traverse its structure between any of them. The root of this tree is designated by the "/" symbol. To reach a desired element (from the root or any other element of the same tree), you need to specify its path, consisting of all elements encountered on the way to it. While elements are referenced simply using their name, attributes are distinguished by the "@" prefix. Paths can be absolute (starting from the root - with the "/" as its first character) or relative (from one non-root element to another). XPath also includes constructs, that are common to programming languages, such as loops or conditional statements that allow you to expand and customize its capabilities. For more information regarding this topic (including some straightforward examples), refer to one of our earlier articles.
We will review operations of XSLT functionality within the SSIS XML Task by taking advantage of an example posted in the Hello, World! (XSLT) article on the MSDN Web site (it is a well established tradition to use a very simple "Hello World" program as a way of helping coding neophytes shed their fears when entering the treacherous, yet extremely boring world of programming). The XML document that we will process has the following format:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="hello.xsl"?> <hello-world> <greeter>An XSLT Programmer</greeter> <greeting>Hello, World!</greeting> </hello-world>
Return to the SSIS project we created earlier and bring up the XML Task Editor interface. Select the XSLT option as the value of the OperationType property in the Input section of the General node. Once this is completed, ensure that SourceType is set to Direct input and paste the content of the XML document listed above to the Source entry (via Source Editor dialog box). Next, choose Direct input as the value of the SecondOperandType and paste the following XSLT stylesheet as its content:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/hello-world"> <HTML> <HEAD> <TITLE></TITLE> </HEAD> <BODY> <H1> <xsl:value-of select="greeting"/> </H1> <xsl:apply-templates select="greeter"/> </BODY> </HTML> </xsl:template> <xsl:template match="greeter"> <DIV>from <I><xsl:value-of select="."/></I></DIV> </xsl:template> </xsl:stylesheet>
Finally, in the Output section, under the OperationResult subnode, select File Connection as DestinationType and create a new File Connection Manager pointing to an HTML file (e.g. XMLTaskOutput.htm) and assign it to the Destination property (with DestinationType set to File Connection). Close the XML Task Editor window and execute the package. This should populate the destination HTML file with the following content:
<HTML> <HEAD> <META http-equiv="Content-Type" content="text/html; charset=utf-8"> <TITLE> </TITLE> </HEAD> <BODY> <H1>Hello, World!</H1> <DIV>from <I>An XSLT Programmer</I></DIV> </BODY> </HTML>
When viewed via Internet Explorer, this displays the familiar "Hello, World!" message in bold font, followed by "from An XSLT Programmer" line below.
In our next article, we will continue our review of the remaining types of SSIS Control Flow XML Task operations.