Working with XML Data in MySQL

XML was created to facilitate the transportation of data between systems and businesses in a language nonspecific way. When first introduced, a script or application process had to parse the XML data and convert it into valid format for the database and underlying system. Later, databases started to support XML so that no conversion was necessary. Today, we’ll look at XML support in MySQL and learn how to import and export data in XML format.

Exporting Data as XML Using the –xml Option

Prior to version 5.0, MySQL XML support was limited to exporting data in XML format using the MySQL command line client. Executing a command or query using the –xml or -X option told the MySQL client utility to produce the output as XML data. For instance, the following line would output all of the database variables that begin with the string “version”:

C:>mysql -u <userid> -p<password> -e 
	"SHOW VARIABLES LIKE '%version%'" –-xml

The resulting XML contains the following standard format:

  • The entire dataset is contained within the <resultset> node.
  • Each row is contained within a <row> node.
  • The columns are contained within a <field> node.
  • The column name is stored in the name attribute.
  • The column value is the <field> text node.
<?xml version="1.0"?>

<resultset statement="SHOW VARIABLES LIKE '%version%'" 
	xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
        <field name="Variable_name">protocol_version</field>
        <field name="Value">10</field>
  </row>

  <row>
        <field name="Variable_name">version</field>
        <field name="Value">5.1.30-community</field>
  </row>

  <row>
        <field name="Variable_name">version_comment</field>
        <field name="Value">MySQL Community Server (GPL)</field>
  </row>

  <row>
        <field name="Variable_name">version_compile_machine</field>
        <field name="Value">ia32</field>
  </row>

  <row>
        <field name="Variable_name">version_compile_os</field>
        <field name="Value">Win32</field>
  </row>
</resultset>

For comparison, here is the above output again as a tabular resultset:

+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.22-beta-debug   |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | suse-linux-gnu      |
+-------------------------+---------------------+

The same syntax can be applied to any SELECT statement. In my last article, entitled “Make Your MySQL Queries More Efficient with Temporary Tables”, the initial query populated the client_citizenship temporary table with records whose logical_delete_indicator was zero:

INSERT INTO client_citizenship
SELECT 	cl.client_id,
	cl.date_of_birth, 
	cl.gender,    
	cit.citizenship_id, 
	cit.country_code, 
	cit.primary_citizenship
FROM  	temp_table_article.client AS cl,   
	temp_table_article.citizenship AS cit,   
	temp_table_article.client_citizenship_rel AS rel  
WHERE   cl.client_id                 = rel.client_id 
AND     cit.citizenship_id           = rel.citizenship_id
AND     cit.logical_delete_indicator = 0
AND     cl.logical_delete_indicator  = 0;

Performing a “SELECT *” on the temp table returned the following records:

client_id, date_of_birth, gender, citizenship_id, country_code, primary_citizenship
2,         1944-01-15,    F,      4,              20,           0
2,         1944-01-15,    F,      7,              77,           1

As long as we logged in using the –xml flag, a “SELECT *” query will return the following XML-formatted results:

<?xml version="1.0"?>

<resultset statement="SELECT * FROM client_citizenship" 
xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
        <field name="client_id">2</field>
        <field name="date_of_birth">1944-01-15</field>
        <field name="gender">F</field>
        <field name="citizenship_id">4</field>
        <field name="country_code">20</field>
        <field name="primary_citizenship">0</field>
  </row>

  <row>
        <field name="client_id">2</field>
        <field name="date_of_birth">1944-01-15</field>
        <field name="gender">F</field>
        <field name="citizenship_id">7</field>
        <field name="country_code">77</field>
        <field name="primary_citizenship">1</field>
  </row>
</resultset>

One disadvantage of the –xml flag is that the mapping from relational data to XML is fixed. Hence, there is no way to modify the output. If you’re willing to do some string manipulation, you can dispense with the –xml flag entirely and produce your own XML code.

Continuing with the previous query, let’s see how we could alter the output XML so that it conforms to the following format:

  <client>
        <client_id>2</client_id>
        <date_of_birth>1944-01-15</date_of_birth>
        <gender>F</gender>
        <citizenship_id>7</citizenship_id>
        <country_code>77</country_code>
        <primary_citizenship>1</primary_citizenship>
  </client>

One way to do it would be to use the CONCAT() and GROUP_CONCAT() string functions:

	mysql>SELECT CONCAT('n<client>n', 
    ->GROUP_CONCAT('<client_id>', client_id, '</client_id>n' SEPARATOR ''), 
    ->GROUP_CONCAT('<date_of_birth>',date_of_birth,'</date_of_birth>n' SEPARATOR ''), 
    ->GROUP_CONCAT('<gender>',gender,'</gender>n' SEPARATOR ''), 
    ->GROUP_CONCAT('<citizenship_id>',citizenship_id,'</citizenship_id>n' SEPARATOR ''), 
    ->GROUP_CONCAT('<country_cd>',country_cd,'</country_cd>n' SEPARATOR ''),
    ->GROUP_CONCAT('<primary_citizenship>',primary_citizenship,'</primary_citizenship>n' SEPARATOR ''),
    ->'</client>') AS xmldoc 
    ->FROM client_citizenshipG

Although we could get the job done that way, it’s not necesarily easier than using a scripting or programming language to perform the formatting, depending of course on your familiarity with programming syntax.

Importing XML Data Using the Load_File() Function

MySQL’s XML capabilities were expanded as of version 5.1.5 to include two new functions:

ExtractValue(): Extracts a value from an XML string using XPath notation.

UpdateXML(): Returns a replaced XML fragment.

The most common way to store XML in MySQL is to use the LOAD_FILE() function to open an entire XML document, store it in a variable, and insert the variable into a table column. Here is the client_citizenship table again, with a simplified structure. This time, it contains only two fields: an auto-incrementing id and and an xml_data column to store the entire xml document. The TEXT data type is ideal for this purpose because it can accomodate extremely long strings:

CREATE TEMPORARY TABLE client_citizenship (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    xml_data TEXT NOT NULL
); 

SET @xml = LOAD_FILE("c:\client_citizenships.xml"); 

INSERT INTO client_citizenship VALUES (NULL, @xml);

From there we can use the ExtractValue() function to retrieve variables from the xml_data field. ExtractValue() takes two arguments. The first is the XML fragment to be tested; the second is the XPath expression to be matched. Xpath is a query language that was specifically created for selecting nodes from an XML document. You can learn more about XPath on the www.w3.org site. Although we won’t be needing it, the XPath argument may contain the colon character (“:”) in element selectors, in order to support namespaces. Retrieveing data is a two step process. First, we get the XML from the client_citizenship table and place into a variable:

SELECT xml_data FROM client_citizenship LIMIT 1 INTO @xml;

Now we use ExtractValue() with @xml as the first argument and an XPath string for the second:

SELECT 	ExtractValue(@xml, '//row[2]/field[1]/@name'),
	ExtractValue(@xml, '//row[2]/field[1]');

The XPath expression in the first call to ExtractValue() retrieves the value for the “name” attribute of the first <field> element of the second <row>. If you recall, the name attribute contains the column headers. The second Xpath expression fetches the text node, which is of course the field value:

...
</row>

<row> (row[2])
    <field name="client_id">2</field> (field[1])
    <field name="date_of_birth">1944-01-15</field>
...

Here is the output produced by the above query:

+----------------------------------------------+----------------------------------------+
|ExtractValue(@xml, "//row[2]/field[1]/@name") |ExtractValue(@xml, '//row[2]/field[1]') |
+----------------------------------------------+----------------------------------------+
|client_id                                     |2                                       |
+----------------------------------------------+----------------------------------------+

In the next installment, we’ll look at a nicer way to format our resultset so that the proper headers appear in the column headings, rather than the supplied expressions. We’ll also expand the above example to retrieve all the records from the XML document, using a stored procedure. Finally, we’ll learn more about the UpdateXML() function.

» See All Articles by Columnist Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles