How to Merge Records from Two Identical Access Databases

Saturday Jul 19th 2003 by Danny Lesandrini

The process of merging relational data can get quite complicated, especially when you have numerous core-data tables as well as lookup tables. Danny Lesandrini explains the basic steps.

A while back, I replied to a post at the Comp.Databases.MS-Access newsgroup with the subject line of "What's the simplest way to merge two databases?" The user had attempted to bring in the data using the File | Get External Data menu option, but he discovered that all he accomplished was to import his tables. That is a far cry from merging or combining data and understandably so. A properly designed database includes primary and foreign keys and oftentimes, auto generated record ID numbers. As we will see, you cannot just append these records to a single table.

The process of merging this kind of relational data can get quite complicated, especially when you have numerous core-data tables as well as lookup tables. While every case has its own peculiarities, I will explain the basic steps using an employee time tracking application table schema.

Imagine the program has been deployed at two separate sites with two separate back end mdb data files. Some tables contain unique data, such as tblEmployee and tblTimeRecord, and some store not so unique data such as tblStatus.

Let's assume that each table has an autonumber ID field which starts counting from 1. Both sites will have their own employee and time records from 1 to n.

If you merge Employee records from Site B into Site A, you will have to renumber the Site B records so their values don't collide with any Site A ID numbers.

If you change the EmployeeID for Site B, you will have to cascade those changes to other related Site B tables, such as tblTimeRecord. How do you do this?

NOTE: Before beginning, back up your data and always work with a copy of the actual production data!

Consider these necessary changes to Site B database...

  1. Remove the autonumber from EmployeeID field in tblEmployee for Site B.

  2. Store largest EmployeeID from Site A in variable named lngMaxSiteA_ID.

  3. Make sure that Cascade Updates are turned on for every relationship

    in the Site B database that has a relation on EmployeeID.

  4. Run a simple UPDATE query to add the value, lngMaxSiteA_ID, to EmployeeID for every record at Site B tblEmployee. The Cascade Updates should propagate those changes throughout your database.

  5. Append all Site B records from tblEmployee to the Site A tblEmployee.

    (Even though Site A still has an autonumber on EmployeeID, the records will go in without error and the correct ID numbers will be assigned.)

  6. Repeat this process for every table with an autonumber ID.

That was easy, now wasn't it? However, we have not gotten to the hard part yet.

The REAL problem comes in when dealing with 'lookup' tables, like the one I named tblStatus. Let's say that each task in tblTask has an associated status.

If each database (Site A and Site B) is installed with a preset list of statuses, then there is no problem, but if the end users can enter their own values, you will end up with something like this:

Site A: tblStatus
	1  Open
	2  Completed
	3  Cancelled
	4  Needs Approval
	Site B: tblStatus
	1  Completed
	2  Open
	3  Cancelled
	4  Pending

In this case, you will need to create a mapping table to show how these two status tables relate to each other. This tblMapStatus is created in the Site B database and looks like this.

	OldID (B)   NewID (A)    Status
	  1           2          Completed
	  2           1          Open
	  3           3          Cancelled
	  4          Null        Pending

See the problem here?

Site A has a status that we do not see in Site B. That is not a problem from one aspect, namely that Site B does not use the status 'Needs Approval' so we do not need to map to it. But we do need its ID number, #4, for our 'Pending' status.

Therefore, you need to go back to tblStatus at Site A, add the missing status ('Pending') and make note of its newly assigned ID number. In our example, that number is 5, so we update our tblMapStatus accordingly.

	OldID (B)  NewID (A)   Status
	  1          2         Completed
	  2          1         Open
	  3          3         Cancelled
	  4          5         Pending

Now we will have to update our database at Site B by running an UPDATE query...

    UPDATE      tblStatus 
    INNER JOIN  tblMapStatus
    ON          tblStatus.ID = tblMapStatus.ID
    SET         tblStatus.ID = tblMapStatus.NewID

Once again, this process must be repeated for all helper/lookup tables and where necessary, cascade updates must be turned on so that changes imposed by the mapping tables will be propagated throughout the database. When finished, the mapping tables, as well as the lookup tables such as tblStatus can be discarded since their counterpart in the Site A database now contains all possible values.

Well, that describes the basic process but as you can see, it is a lot of work, and a lot of custom coding. I created a tool for one client that was specifically designed to help their customers merge copies of the database. My utility automated the process for this particular application, even providing an interface to assign new values for lookup tables like tblStatus described above. While my utility will not merge your data, it may help you develop your own tool to merge your databases.

If you would like to see my code, drop me a line at datafast@comcast.net.

» See All Articles by Columnist Danny J. Lesandrini

Mobile Site | Full Site