dcsimg
 

Generate Dynamic MySQL Multi-row INSERT Statements

Monday Jan 21st 2019 by Rob Gravelle

Discover how to generate a multiple-row insert statement dynamically using the programming language of your choice.

In the Optimize MySQL Inserts Using Batch Processing article, a couple of ways to achieve faster INSERT speeds in MySQL were explored. The last example that was shown employed multiple-row inserts. At around 65% the throughput of LOAD DATA INFILE statement, the multiple-row insert is substantially faster than individual inserts, while not requiring a temporary text file.

In today's tutorial, you'll get a closer look at how you can generate a multiple-row insert statement dynamically using the programming language of your choice. First, you'll see pseudo-code that outlines the steps; then, you'll see that translated into actual code.

Basic Steps

In essence, the idea is to populate the INSERT values and then execute the query. In this example, we'll use parameters because these will automatically handle special characters and guard against SQL injection attacks.

The general steps are as follows:

  1. Declare variables to hold the INSERT values and your desired batch size.
  2. Iterate over your data, including parameters for each.
  3. Once you're done that, construct and execute the INSERT statement.

Here's how that might play out in pseudo-code:

Array:   $inserts
Integer: $batch_size

while $inserts is less than $batch_size
    Add parameters to $inserts
end while

Set $CommandText to 'INSERT INTO mytable (fields…) VALUES ' plus $inserts
Set Parameters 

Execute Query

Translating the above Steps into Real Code

Now let's switch gears and convert the above pseudo-code into the real thing. The example that we'll be looking at here enumerates over text email files in a directory and inserts them into a MySQL table in batches of one hundred. It's written in PowerShell, Windows' powerful scripting language.

Along with the $batch_size, you can store the insert parameters, file names, and their contents in arrays:

$batch_size          = 100
$global:inserts      = @()
$global:fileNames    = @()
$global:fileContents = @()

Here's the code that enumerates over the email files; a try-catch block is employed to trap exceptions:

$files = [IO.Directory]::EnumerateFiles("F:\Inbox", "*.txt")
foreach ($file in $files) {
    try {
        #process email files…
     }
     catch {
        Write-Host $_.Exception.Message
     }
}

Inside the try block, you can create named parameters for the file name and content fields using a numeric suffix to uniquely identify each one. The global fileNames and fileContents variables are also stored for later use:

if ($global:inserts.length -lt $batch_size) { 
  #Adding 1 to the current zero-based global inserts array length 
  #offsets it so that the $filesCounter suffix starts at 1.
  #i.e.: (@fileName1, @fileContent1), (@fileName2, @fileContent2), (@fileName3, @fileContent3)
  $filesCounter         = $global:inserts.length + 1
  $global:inserts      += "(@fileName$filesCounter, @fileContent$filesCounter)"
  $global:fileNames    += [System.IO.Path]::GetFileName($file)
  $global:fileContents += Get-Content -Raw $file
}

In the else part of the the if-else block, the SQL statement is executed. But, before that, you have to declare a new MySqlCommand object and set its CommandText to the SQL string. Then, the parameters must be populated with the contents of the global fileNames and fileContents variables. Here's the code to accomplish that:

else {
  $insertCommand = New-Object MySql.Data.MySqlClient.MySqlCommand
  $insertCommand.Connection = $connection
  #Set Array delimiter
  $ofs = ','
  $insertCommand.CommandText = 'INSERT INTO msg_table (filename,msg) VALUES ' + $global:inserts
  $insertCommand.Prepare
  for($i = 0; $i -lt $global:fileNames.Length; $i++) {
    $insertCommand.Parameters.AddWithValue("@fileName$i",    $global:fileNames[$i])
    $insertCommand.Parameters.AddWithValue("@fileContent$i", $global:fileContents[$i])
  }
  
  $iRowsAffected = $insertCommand.ExecuteNonQuery()
  
  Write-Host "Inserted $iRowsAffected rows."
}

To recap everything you just did, here is the full source code:

$batch_size          = 100
$global:inserts      = @()
$global:fileNames    = @()
$global:fileContents = @()

$files = [IO.Directory]::EnumerateFiles("F:\Inbox", "*.txt")
foreach ($file in $files) {
    try {
        if ($global:inserts.length -lt $batch_size) { 
          $filesCounter         = $global:inserts.length + 1
          $global:inserts      += "(@fileName$filesCounter, @fileContent$filesCounter)"
          $global:fileNames    += [System.IO.Path]::GetFileName($file)
          $global:fileContents += Get-Content -Raw $file
        }
        else {
          $insertCommand = New-Object MySql.Data.MySqlClient.MySqlCommand
          $insertCommand.Connection = $connection
          #Set Array delimiter
          $ofs = ','
          $insertCommand.CommandText = 'INSERT INTO msg_table (filename,msg) VALUES ' + $global:inserts
          $insertCommand.Prepare
          for($i = 0; $i -lt $global:fileNames.Length; $i++) {
            $insertCommand.Parameters.AddWithValue("@fileName$i",    $global:fileNames[$i])
            $insertCommand.Parameters.AddWithValue("@fileContent$i", $global:fileContents[$i])
          }
          
          $iRowsAffected = $insertCommand.ExecuteNonQuery()
          
          Write-Host "Inserted $iRowsAffected rows."
        }
    }
    catch {
        Write-Host $_.Exception.Message
    }
}

A 2nd Example using a Function

Once you've got the hang of it, you should consider putting all of your insert code into a function that you can reuse for any table that you want to apply bulk inserts to.

This example comes from StackOverflow. It makes use of PHP's implode() function to generate the table field names and values to insert:

<?php
    function dynamicInsert($table_name, $assoc_array){
        $keys   = array();
        $values = array();
        foreach($assoc_array as $key => $value){
            $keys[]   = $key;
            $values[] = $value;
        }
        $query = "INSERT INTO `$table_name`(`".implode("`,`", $keys)."`) VALUES('".implode("','", $values)."')";
        echo $query;
    }
    
    //Invoking the function
    dynamicInsert("users", array(
        "username" => "Test User",
        "password" => "Password123"
    ));
?>

As mentioned by the contributor, the above code is not secure, so you should run mysql_real_escape_string() and any other necessary sanitation on the variables being sent to your MySQL server. You may also want to prevent this script from being executed on any public facing platform as "a dynamic insert could allow for huge security risks!"

Conclusion

Don't assume that multiple-row inserts will outperform individual inserts under any condition. Be sure to compare your process against a suitable benchmark. I recently converted a script to employ bulk inserts and was shocked when it took about 20% longer than inserting one row at a time! The moral of that story is that, if you are working with huge datasets, you should strongly consider using a compiled language or perhaps employing a cloud service that can handle large volumes. The reason is that the overhead of preparing the data can take a lot of cycles, depending on the language and environment that the code is executed on.

Home
Mobile Site | Full Site