Importing XML, CSV, Text, and MS Excel Files into MySQL

My most recent articles, Importing XML Data into MySQL Tables Using a Stored Procedure and Enhance Your MySQL XML Import Procedures using Prepared Statements, explored how capable stored procedures were in importing XML-formatted data. At the end of those articles, I concluded that as a DIY solution, stored procs are indeed a viable option. For those of you less inclined to write and maintain your own import code, there are tools that can markedly simplify the importing of data from various sources.  In today’s article, I’m going to demonstrate how to use the Navicat Database Admin Tool to acquire data from XML, .csv, .txt, and Excel files.

Why Navicat?

In previous articles I employed a free MySQL GUI front-end called HeidiSQL.  It was a great product, but, as some readers were apt to point out, it’s only available on Windows platforms.  In an effort to meet the needs of the majority of readers, I opted to go with a product that runs on all the major OSes, namely the big three: Windows, Mac, and Linux.

Navicat is also a well-maintained product with an extremely large user base. Both those traits mean that bugs will be stamped out very quickly.  Not that there would be many bugs left to find; at version 11.0.17, it’s a highly stable product at this point.

Note that this is a commercial product and requires a license after the free trial of 30 days.  By that time you should have a much better idea whether or not it’s something that you want to invest in or not.

Getting Started

The trial version of Navicat for MySQL may be downloaded from the company’s website.  The 30-day trial version of the software is identical to the full Enterprise Edition so you can get the full impression of all its features.  Moreover, registering with PremiumSoft via the location 3 links gives you free email support during the 30-day trial.

After you’ve downloaded the installation program, launch it and follow the instructions on each screen of the wizard to complete the installation. 

Connecting To the Database

To start working with your MySQL database, you must first establish a connection to it using the connection manager. To do that:

  1. Click the Connection button on the far top-left of the application window and select MySQL from the dropdown menu.
  2. On the New Connection screen:
    1. Give your connection a good descriptive name.
    2. By default, the MySQL server listens for connections on port 3306.  Don’t change this unless you need to.
    3. Supply credentials for an account that possesses table modification rights.
    4. You can verify your information by clicking the Test Connection button. If it comes back with a “Connection successful!” message, you can either go to another tab to enter more specialized connection information or simply hit the OK button to save your information.

Figure 1: The New Connection Dialog
The New Connection Dialog

  1. To use your credentials to establish a connection to your database, either select File > Open Connection or right-click on your connection in the list under the Connection button and select Open Connection from the popup menu.

                That will give you access to all the databases running on that server. 

  1. Double-click the database that you wish to work with.  You’ll know that it’s connected by the data store icon, which will turn green:

Figure 2: Selected Database
Selected Database

Alternatively, you can create a completely new database as follows:

  1. Right-click anywhere in the database schema list and select New Database… from the popup menu:

 

  1. In the New Database dialog…
    1. Ascribe a name to your database, such as “navicat_imports_db”.
    2. I like to set the Character set to UTF-8, which is Unicode.
    3. Set the Collation to utf8_unicode_ci. There is a similar entry called utf8_general_ci, but that is an older collation, which is hardly ever used anymore.

Figure 3: The New Database Dialog
The New Database Dialog

  1. Click the OK button to create the new database schema.

Creating the Target Tables

You could use the New Table wizard to create the target table, but I’ll give you the table definition to make things easier.

  1. Click the large Query button on the main toolbar to bring up the Query commands and then click the New Query button.
  1. Paste the following code into the Query Editor:
DROP TABLE IF EXISTS `menu_items`;
 CREATE TABLE `menu_items` (
   `id` int(11) NOT NULL,
   `name` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
   `price` decimal(5,2) DEFAULT NULL,
   `description` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
   `calories` smallint(255) DEFAULT NULL,
   PRIMARY KEY (`id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
  1. Click the Run button directly above the Query Editor tab to execute the query and create the new table.
  2. Hit the F5 key to refresh the database schema list and bring up the navicat_imports_db table that we just created.

Figure 4: navicat_imports_db Table
navicat_imports_db Table

We are now ready to import some data.

Working with XML Data

In keeping with the theme of my last two articles, I’ll start with XML.

The XML document that I’ll be using today is a sample document of menu items.  It contains information about some typical breakfast foods.

<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
      <food id="1">
            <name>Belgian Waffles</name>
            <price>$5.95</price>
            <description>
Two of our famous Belgian Waffles with plenty of real maple syrup
 </description>
            <calories>650</calories>
      </food>
      <food id="2">
            <name>Strawberry Belgian Waffles</name>
            <price>$7.95</price>
            <description>
Light Belgian waffles covered with strawberries and whipped cream
</description>
            <calories>900</calories>
      </food>
      <food id="3">
            <name>Berry-Berry Belgian Waffles</name>
            <price>$8.95</price>
            <description>
Light Belgian waffles covered with an assortment of 
fresh berries and whipped cream
</description>
            <calories>900</calories>
      </food>
      <food id="4">
            <name>French Toast</name>
            <price>$4.50</price>
            <description>
Thick slices made from our homemade sourdough bread
</description>
            <calories>600</calories>
      </food>
      <food id="5">
            <name>Homestyle Breakfast</name>
            <price>$6.95</price>
            <description>
Two eggs, bacon or sausage, toast, and our ever-popular hash browns
</description>
            <calories>950</calories>
      </food>
</breakfast_menu>

The Import Process

In Navicat, imports are accomplished using the Import Wizard.  It will guide you through all of the steps based on the type of data source selected.  One way to launch the wizard is to right-click on target table and then select Import Wizard from the popup menu:

Figure 5: Import Wizard Command
Import Wizard Command

  1. The first screen of the wizard presents a list of file formats to choose from.  Select the XML File (*.xml) radio button and click the Next > button.

Figure 6: Import Wizard Data Format Screen
Import Wizard Data Format Screen

  1. On the following screen…
    1. Click the ellipsis (…) button to browse to the breakfast_menu.xml file on your file system.
    2. From the Encoding dropdown menu, select 65001 (UTF-8) as the file Encoding,
    3. Click Next >:
  2. The next screen is where you choose the tag that identifies each data row.
    1. Select the “food” tag from the dropdown list.
    2. Click the Consider tag attributes as table field checkbox so that the id attribute will be picked up.
    3. Click Next > to proceed.
  3. On screen four, you’ll find some additional options for the source file including the first and last data rows as well as date & time formats. Since we are importing all rows and have no dates, you can click Next >.

 

  1. The Target Table screen lets you choose between an existing table or to create a new one based on the imported data fields.  Since we started the wizard by right-clicking on the target table, it should already be selected. Click Next >.

Figure 7: Import Wizard Target Table Screen
Import Wizard Target Table Screen

  1. The Field Mappings screen is where you can map your XML text nodes and/or attributes to their respective target table columns.  Note that those with matching names will be paired up for you.  Select the id as the primary key by clicking on the cell dierectly below the Primary Key header and click Next >.

Figure 8: Import Wizard Field Mappings Screen
Import Wizard Field Mappings Screen

  1. The last decision to make is to choose how records will be appended and/or updated.  The first option to Append records will do just fine for us since we are adding all new records.  There is also an Advanced button on this screen for setting a few additional options such as using extended insert statements and inserting empty strings instead of Nulls.  Click Next > to go to the final screen.

Figure 9: Import Wizard Import Mode and Advanced Properties Dialog
Import Wizard Import Mode and Advanced Properties Dialog

  1. On the eighth and last screen of the Import Wizard, click the Start button to kick off the import process.  The results will be displayed in the gray textarea.

Figure 10: Import Wizard Start Screen
Import Wizard Start Screen

Now that you’ve gone through all of the steps to setup your import process, you don’t have to repeat them every time you want to import some records.  You can save it via the Save button.  That will allow you to run your import as a Scheduled Job as well.

Figure 11: Scheduled Jobs List
Scheduled Jobs List

Click the menu_items table to see its contents, which include our imported data.

Figure 12: The menu_items Table with Imported XML Data
The menu_items Table with Imported XML Data

At this time Navicat is limited to a single level of XML data.  Hopefully, that will be improved upon in an up-coming release.

Importing from Text and CSV files

A Fixed Width text file is a data transfer format that is often used with mainframe data feeds. In a Fixed Width text file, fields are stored in specific positions within each line of data. For example, in each line of the breakfast_menu row structure below, the name field occupies the fifty character positions of 11 through 60 inclusive:

_10 chars_ _50 chars_ _10 chars_ _255 chars_ _10 chars_
|          |          |          |           |          |
id         name       price      description calories

The greatest advantage of the Fixed Width format is that there are no delimiters that could appear in the data, as with CSV files.

Follow this procedure to create a Fixed Width file and import its contents into MySQL:

  1. Copy and paste the following data into a file and save it as breakfast_menu.txt.
id name                       price  description                                                                    calories
 1 Belgian Waffles             $5.95  Two of our famous Belgian Waffles with plenty of real maple syrup                   650
 2 Strawberry Belgian Waffles  $7.95  Light Belgian waffles covered with strawberries and whipped cream                   900
 3 Berry-Berry Belgian Waffles $8.95  Light Belgian waffles covered with an assortment of fresh berries and whipped cream 900
 4 French Toast                $4.50  Thick slices made from our homemade sourdough bread                                 600
 5 Homestyle Breakfast         $6.95  Two eggs, bacon or sausage, toast, and our ever-popular hash browns                 950        
  1. To launch the Import Wizard, click the Import button on the main toolbar.

 

    1. This time, on the first screen of the wizard, select the Text file (*.txt) radio button.  Although Fixed Width files tend to have a .txt extension, other extensions such as .dat are also possible.  For those, the second item in the file type list contains the *.* match all file types.     
    2. Select the character encoding that matches your file, e.g., Current Windows Codepage for Windows ANSI files.
  1. The third screen of the Import Wizard lets you choose between fixed width and delimited fields.  In the case of fixed width fields, you need to identify the field breaks by clicking a ruler with the mouse.  Don’t forget to click after the last field on the right as it needs to know the length of that field as well.      

Figure 13: Import Wizard Field Delimiter Screen
Import Wizard Field Delimiter Screen

  1. On screen four, you can designate a row as the field headers as well as set the first and last data rows.  It’s not a bad idea to enter both if you know the length of the input file.
  2. On screen five, make sure that the menu_items table is selected as the target.
  3. As usual, make sure that the fields on screen six map correctly.
  4. Next, choose an import mode from screen seven and you’re ready to kick off the import.
  5. Upon completion, if you take a look at the menu_items table with the imported data you’ll notice that the prices have been converted into numbers so that the dollar sign is not included with the values.

Figure 14: The menu_items Table with Imported Fixed Width Data
The menu_items Table with Imported Fixed Width Data

Importing CSV Files

The CSV (“Comma Separated Value”) file format originated in Microsoft Excel, but has since become a pseudo standard throughout the industry, even among non-Microsoft platforms.  As is the case with most exchange formats since XML, CSV files have been relegated to that of legacy format.  Modern applications that include an export format tend to use XML today.

Here is the breakfast menu data again, this time using the CSV format.

id,name,price,description,calories  
 1,"Belgian Waffles","$5.95","Two of our famous Belgian Waffles with plenty of real maple syrup",650
 2,"Strawberry Belgian Waffles","$7.95","Light Belgian waffles covered with strawberries and whipped cream",900
 3,"Berry-Berry Belgian Waffles","$8.95","Light Belgian waffles covered with an assortment of fresh berries and whipped cream",900
 4,"French Toast","$4.50","Thick slices made from our homemade sourdough bread",600
 5,"Homestyle Breakfast","$6.95","Two eggs, bacon or sausage, toast, and our ever-popular hash browns",950  

The CSV format is a lot more compact than fixed widths because each field only needs to be as long as its content. It’s also easier to parse because of the clearly identified delimiter.  Although the comma is used by convention, really any character may be used.  Whatever character you do opt for, be extra careful that it does not appear in any of the data because that will wreak havoc on the import process!  In fact, the description for the Homestyle Breakfast above does contain several commas.  To get around this, we can either substitute a different delimiter, or enclose all string data within quotes, as I did.

The process for importing CSV files is very similar to text data except that on screen three, the delimited and fixed width radio buttons are disabled so that Delimited is the only option.

Figure 15: Import Delimiter Screen for CSV Data
Import Delimiter Screen for CSV Data

Other than that, the two formats are really quite interchangeable.

Importing from MS Excel

Although Excel provides the CSV format for transferring data, Navicat can import directly from Excel. 

On the File Type screen of the Import Wizard, notice that there are actually two radio buttons for Excel: one for .xls files and one for the newer .xlsx 2007 and later format.  Choose the one for your version of Excel.

One Excel file may contain numerous workbooks, so you can import from more than one at a time.

From there, the process is not much different than for any other file type.  Just make sure that your fields are correctly mapped and that you start from the second row if you have column headers in your data.

Conclusion

For Database Administrators who are not inspired to write and maintain their own import procedures, Navicat does a good job of importing data from external data sources. 

See all articles by Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles