Back in my Data Formatting, Working with XML Data in MySQL article, I described how to import XML-formatted data that was created using the mysqldump utility with the –xml flag. That solution relied on a combination of the MySQL LOAD_FILE() and ExtractValue() function. At that time, version 5.4 of MySQL was in Beta. Now, with version 5.5 in full production, the LOAD XML statement greatly simplifies the task of importing data from an XML file into a MySQL table, without having to resort to a stored procedure as previously discussed. This article provides a summary of how to use the new LOAD XML statement.
Usage and Syntax
LOAD XML INLINE is the counterpart of the mysql client’s XML output, which is produced by adding the --xml flag:
C:\>mysql -u <userid> -p<password> –-xml -e "SELECT * FROM temp_table_article.client" > c:\\sqldump.xml
The –e flag executes the statement that follows it. Then the xml-formatted results are written to the file following the > direction operator.
LOAD XML INLINE is equally suited for importing XML data created via the mysqldump utility:
C:\>mysqldump --xml temp_table_article client > c:\\sqldump.xml
In its simplest form, the LOAD XML INLINE statement takes the path to the XML file and the table into which the data should be inserted:
LOAD XML INFILE 'file_name' INTO TABLE [db_name.]tbl_name
Here then is the statement to populate the client table with the data contained in the sqldump.xml file:
LOAD XML INFILE 'c:\\sqldump.xml' INTO TABLE temp_table_article.client
By default, the content of each <row> element is considered to contain one database table row. LOAD XML INFILE accepts three formats:
FORMAT 1 – columns as attributes: <row client_id="1" date_of_birth="1976-02-12" gender="M" logical_delete_indicator="1" /> <row client_id="2" date_of_birth="1944-01-15" gender="F" logical_delete_indicator="0" /> FORMAT 2 – column names as tags: <row> <client_id>1</client_id> <date_of_birth>1976-02-12</date_of_birth> <gender>M</gender> <logical_delete_indicator>1</logical_delete_indicator> </row> <row> <client_id>2</client_id> <date_of_birth>1944-01-15</date_of_birth> <gender>F</gender> <logical_delete_indicator>0</logical_delete_indicator> </row> FORMAT 3 (*mysqldump) – columns represented by the <field> tag with a name attribute: <row> <field name="client_id">1</field> <field name="date_of_birth">1976-02-12</field> <field name="gender">M</field> <field name="logical_delete_indicator">1</field> </row> <row> <field name="client_id">2</field> <field name="date_of_birth">1944-01-15</field> <field name="gender">F</field> <field name="logical_delete_indicator">0</field> </row>
*You may recognize the last format as that produced by the mysqldump utility.
You can use any combination of the above formats in the same XML file; LOAD XML INFILE detects the format for each row and interprets it accordingly. Moreover, you aren’t limited to using the <row> tag as the row element. The default can be overridden, as we’ll see a little later on…
Working with Columns
Unlike its more flexible LOAD DATA INFILE cousin, which accepts a column list [tablename (col1,col2,...)], LOAD XML INFILE expects the fields in the XML file to correspond to the physical column names in the target table. Otherwise, MySQL doesn’t know how to match input fields with table columns. Column names that do not match those in the table are ignored, while columns in the table that don't have corresponding fields in the XML are set to NULL.
However, the LOAD XML INFILE statement also supports the SET clause. It can accept the result of just about any expression, which enables you to perform transformations on column values before assigning the result to columns or even assigning the value of one column to another.
Here are a few examples:
LOAD XML LOCAL INFILE 'person.xml' INTO TABLE person_xml SET column2 = column2/100; LOAD XML LOCAL INFILE 'person.xml' INTO TABLE person_xml SET column3 = CURRENT_TIMESTAMP; LOAD XML LOCAL INFILE 'person.xml' INTO TABLE person_xml SET column1 = NULL, column3 = CURRENT_TIMESTAMP, column4 = NULL;
Input File Location
With regards to file paths, I recommend placing the file on the same machine as the MySQL server and use an absolute path in the command, as in the examples we’ve seen thus far. For relative paths, MySQL uses the following rules to locate the file:
- When the relative path name contains one or more leading components – such as .\inputfile.xml , MySQL uses the data directory as the reference point.
- When no leading components are given, MySQL expects the file to reside in the database directory of the default database. Hence, if mydb is designated as the default database, the following LOAD XML INFILE statement would read the file inputfile.xml from the database directory for mydb, even though the statement explicitly loads the file into a table in the myotherdb database:
LOAD XML INFILE 'inputfile.xml' INTO TABLE myotherdb.my_table;
Windows path names can be specified using either forward or back slashes, but if you opt to use backslashes, you must double them, as in:
LOAD XML INFILE 'backup.xml'
INTO TABLE databases\\movies\\moviedb.characters;
For security purposes, the files must either reside in the database directory or have read privileges for all users.
Files may also reside on the client machine, that is, the one on which you are executing the commands. In that case, include the LOCAL keyword. That will cause the file to be read by the client program on the client machine and sent to the server. Again, the file can either be given as a full path name to specify its exact location or as a relative path name. In the case of the latter, the path is interpreted relative to the directory in which the client program was started.
Here is the last example with the LOCAL keyword included:
LOAD XML LOCAL INFILE 'backup.xml' INTO TABLE uploaddata\\moviedb.characters;
Some Useful Options
IGNORE number LINES / ROWS
Use the IGNORE number LINES or IGNORE number ROWS clause to skip the first number of rows in the XML file. One particularly noteworthy use of the IGNORE LINES option is to skip over a header line containing column names:
LOAD XML INFILE 'c:\\sqldump.xml' INTO TABLE temp_table_article.client IGNORE 1 LINES
ROWS IDENTIFIED BY
As alluded to earlier the default row element tag can be overridden. To do so, include the ROWS IDENTIFIED BY clause with the new element in single quotes, including the tags: i. e., ‘<tagname>’.
The following XML file depicts movie characters along with some related information, such as the actor who played the part, the character’s role in the movie, and the movie name:
<?xml version=1.0"?> <character> <role>Dirty Harry</role> <actor>Clint Eastwood</actor> <part>protagonist</part> <film>Dirty Harry</film> </character> <character> <role>Indiana Jones</role> <actor>Harrison Ford</actor> <part>protagonist</part> <film>Raiders of the Lost Ark</film> </character> <character> <role>Ethan Hunt</role> <actor>Tom Cruise</actor> <part>protagonist</part> <film>Mission Impossible</film> </character>
We need to designate the <character> tag as the row element, so we would include the ROWS IDENTIFIED BY clause as follows:
LOAD XML INFILE 'c:\\sqldump.xml' INTO TABLE temp_table_article.client ROWS IDENTIFIED BY '<character>';
The MySQL 5.5 LOAD XML statement adds some much needed native XML importing capability. As a result, it’s now a lot easier to load XML data into a MySQL table, without having to resort to writing a stored procedure or using third-party tools.