Database migration projects with GuttenBase – Copying done right

There are many tools to visualize or analyze databases. You will also find lots of programs to copy databases between different vendors. However, we experienced  these tools are not flexible enough for our migration projects. They fail because, e.g., they cannot map the various data types between different databases correctly, or because the amount of data becomes too big. The solution we suggest is to program sophisticated data migrations using an extensible framework instead of configuring some (limited) tool. We found that this approach gives us much more flexibility when performing data migrations. Migrating a database almost always requires a custom solution, since every system has its peculiarities. Another advantage of “programming” a migration is that your developers may freely combine plain copying code with computational parts. For example, it may be necessary  to contact a third-party system during a migration process in order to obtain some information. In one of our projects we had to contact a GIS (Geographic information system) server to relate the positional IDs stored in the database with those in the GIS database.

Example

The following code example first checks whether the source and destination are compatible. Then the values from the source database are written to the target database. Finally, we perform (empirical) tests whether the data has been transmitted correctly.

ConnectorRepository repo = new ConnectorRepositoryImpl();
repo.addConnectionInfo("MySql", new MySqlConnectionInfo());
repo.addConnectionInfo("Postgresql", new PostgresqlConnectionInfo());

new CheckSchemaCompatibilityTool(repo).
  checkTableConfiguration("MySql", "Postgresql");
new DefaultTableCopyTool(repo).
  copyTables("MySql", "Postgresql");
new CheckEqualTableDataTool(repo).
  checkTableData("MySql", "Postgresql");

Configuration

The source database will be accessed read-only, whereas the target database connection is configured to disable auto-commit and database constraints. This allows for efficient and order-independent copying of data. The programmer may influence the process in various ways by overriding the life cycle methods of the framework, e.g.

public void beforeTableCopy(...) { … }
public void afterInsert(...) { … }

The copying process itself may be influenced by various configuration hints: You may declare the mapping of tables or columns between databases, define the order of the copied tables, filter or drop redundant data, and most importantly, transform the data while it is being copied. To give an real-life example: We once had the idea of transforming all primary keys in the data base (Long) to Universally Unique Identifier (UUIDs) while of course all references between tables needed to be updated, too. There are also database specific hints that configure, e.g., the number of rows that will be committed per batch.

Conclusion

We found the  GuttenBase framework very useful in our database migration projects. It gives programmers complete control over the transformation process and is extensible in many ways. Some of its features are:

  • Export complete database into a single JAR/ZIP file and import it elsewhere without an online connection.
  • Possibility to combine transformation code with third-party libraries.
  • Add analysis code to inspect collected database meta data.

The GuttenBase framework is an Open Source project hosted at GitHub. There is also a web site with documentation and examples at http://guttenbase.spree.de/.

Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s