The “changing the database” experience

I have been involved in Enterprise JAVA projects since 1998. Almost all projects have had an RDBMS “behind” it, mostly Oracle and DB2.

All the projects have been some custom defined application often based on existing infrastructure, and the database vendor and version has been predefined from the beginning.

Still, there has almost always been a more or less explicit demand that the application being developed should be easily ported to another database vendor. Until now, this has never happened to me; I have never been in a situation where the customer suddenly decides to change their entire database infrastructure from one major database vendor to another.

Except for one situation: I was involved in a project where we were developing a version-2 of an existing application. The original application was based on Microsoft tools and databases; the new application was to be based on J2EE and Oracle.

The demand that the application should be easily portable to another database basically has the following impact on the development:

  • Use a good, preferably a standard-, ORM tool can help hide technical details about the database integration, and can enable you to write your queries in an “abstract query language”.
  • Minimize the use of native SQL, vendor specific database features, stored procedures, triggers and user defined functions. They will have to be portable and a database specialist is often needed. Some features in one RDBMS might not have comparable features in other RDBMS’.
  • Hard coding SQL in the application is forbidden – all SQL must be in configuration files external to the code to ease translation.

Of course, if the application is a general component or a product to be used by many different customers or users, the demands on the database integration is even harder – you can only rely on the most basic features common to all the supported databases.

There are other good reasons for points mentioned above, but in this article I’m taking the perspective of creating applications that are easily ported to other databases.

In the following I will discuss some of the experiences that I have had with a recent project, where we ported the entire persistence layer from Oracle to PostgreSQL.

The application

The application is a component or subsystem that is to be integrated into a larger application. It is a J2EE 1.4 component implemented using standard J2EE technologies.

Version 0.5 of the application – the first version actually delivered to the customer as a “preview” – is based on IBM WebSphere version 5.1.2 and Oracle 10g. The entire persistence layer was implemented with EJB 2.0 CMP.

The application has about 20 entities with Container Managed Relations. Some entities have many-to-many relations implemented with join tables.

Four very specialized queries were implemented as stored procedures for performance reasons. A small part of the application framework was designed in order to be able to call a stored procedure and build a result in a database independent way.

JPA

One of my first tasks on the project was to replace the existing EJB 2.0 CMP based persistence layer with JPA. Version 1.0 of the application is going to be deployed on WebSphere version 6.1 (J2EE 1.4), still with Oracle 10g as RDBMS.

Existing EJB-QL is more or less directly reused as JPA-QL named queries, 3 out of 4 stored procedures are reimplemented as named native queries. All native queries are placed in a specific “native_queries.xml” file, linked from “persistence.xml” via a <mapping-file> element.

The existing rather complex infrastructure designed to be able to call stored procedures are rewritten as a much simpler “quick-and-dirty” implementation, using “plain” JDBC. The expectation is that the last stored procedure can be implemented later as a complex native query.

This conversion process is over in less than 3 weeks, with a lot of help from about 150 unit tests. In the process I stumble into several serious issues with Hibernate-JPA, one where I develop a patch (see An Open Source encounter). Some queries need to be rewritten as native queries to “code around” issues with Hibernate.

Moving project

The project is initially owned, hosted and developed by one company (A). Here the developers are equipped with powerful desktop computers. Several servers are dedicated to the project, one as a database server.

Each developer has a personal development schema on the database server – testing and debugging is not possible without a network connection to the database server. This is no problem as each developer is physically located at (A) when working.

Suddenly, the project is going to “move”. Company (B) is going to take over the project. The external consultants are to “move along”, new people are hired by (B) and some of (A)’s developers, testers and business specialists are borrowed by (B). All are to be physically located at (B).

At (B) everybody is equipped with reasonable powerful notebooks. And there is a wish that developers be able to work “disconnected”, e.g. at home. But, the developers will not be able to do testing without connection to the database…

One solution could be to simply install an Oracle instance on each developer notebook. But we were concerned about resource problems and possible licensing issues.

We decided we should try to look at an Open Source alternative – the Lead Architect suggested we looked at MySQL. Now, I don’t like MySQL at all (see Why I Hate MySQL), but have pretty good experience with PostgreSQL from various hobby projects. And, in my point of view, PostgreSQL looks a lot more like Oracle than MySQL does… So I suggested we should go for PostgreSQL which we did – we went for PostgreSQL version 8.2.4.

Note: Installing and configuring PostgreSQL version 8.2.4 on a standard Windows XP computer is a simple task done in a few minutes.

The schema

First part of the port was to rewrite the Oracle specific DDL in something “less Oracle specific”. We are going to port the component to PostgreSQL, but by doing it we try to introduce as few PostgreSQL specific features as possible.

I spent a few hours going through a createTables.sql – converting it table by table and testing the DDL on PostgreSQL in the process.

Below is an example of the Oracle DDL:

CREATE TABLE THETABLENAME (
	UUID CHAR(32 BYTE) NOT NULL ENABLE,
	TYPE CHAR(1 BYTE) NOT NULL ENABLE,
	AMOUNT NUMBER(14,2) NOT NULL ENABLE,
	VALIDTS TIMESTAMP (6) NOT NULL ENABLE,
	SPECIALKEY VARCHAR2(12 BYTE)  NOT NULL ENABLE,
	BALANCE NUMBER(12,2) NOT NULL ENABLE,
	QUANTITY NUMBER(10,0) NOT NULL ENABLE,
	ID VARCHAR2(36 BYTE) NOT NULL ENABLE,
	CONSTRAINT PK_THETABLENAME PRIMARY KEY (UUID) ENABLE
);

The same DDL for PostgreSQL:

CREATE TABLE THETABLENAME (
  	UUID CHAR(32) NOT NULL,
	TYPE CHAR(1) NOT NULL,
	AMOUNT DECIMAL(14,2) NOT NULL,
	VALIDTS TIMESTAMP(6) NOT NULL,
	SPECIALKEY VARCHAR(12) NOT NULL,
	BALANCE DECIMAL(12,2) NOT NULL,
	QUANTITY INTEGER NOT NULL,
	ID VARCHAR(36) NOT NULL,
	CONSTRAINT PK_THETABLENAME PRIMARY KEY (UUID)
);

The table below illustrates some of the conversions being used:

Oracle PostgreSQL / SQL
NUMBER(n,m) DECIMAL(n,m)
VARCHAR2(n BYTE) VARCHAR(n)
CLOB TEXT
NUMBER(n,0) INTEGER or BIGINT

PostgreSQL in WebSphere

WebSphere 6.1 does not have native PostgreSQL support, so one has to create a “Generic” JDBC provider, using the org.postgresql.xa.PGXADataSource implementation class for an XA-enabled provider. Creating the data source one must select the com.ibm.websphere.rsadapter.GenericDataStoreHelper – this gives a warning in the console, but it seems to work fine.

Stored procedures

When I was done with the schema DDL and setting up data sources, I felt lucky. I ran the entire test-suite expecting only a few failures – I knew I had one stored procedure (SP) yet to implement in PostgreSQL. Wrong!

Almost all the unit tests failed in the setUp() method while calling some TestServices.loadTestData() method… It appeared that the component had a “secret” TestServices service that called some stored procedures to setup and teardown test-data for the unit tests. And doing it using some crude JDBC code that short-circuited the entire application framework…

Now I knew I had not one SP to reimplement in PostgreSQL – I had 4-5 different SP’s I had to take into consideration. And all the SP’s were called from different parts of the component with its own “proprietary” JDBC implementation. I felt like introducing a simple DAO layer to isolate most of the rudimentary JDBC code.

Calling a SP that returns a result-set has different syntax and other implementation details between Oracle and PostgreSQL. The Oracle version is like the following:

...
CallableStatement stmt = conn.prepareCall("{ call ? := somefunction(?, ?) }");
stmt.registerOutParameter(1, oracle.jdbc.OracleTypes.CURSOR);
...

The PostgreSQL version:

...
CallableStatement stmt = conn.prepareCall("{ ? = call somefunction(?, ?) }");
stmt.registerOutParameter(1, java.sql.Types.OTHER);
...

That helped – my simple DAO implementation that, based on some configuration parameters chose the right backend implementation, would hide those tedious details. And while introducing my simple DAO-like framework, I also reintroduced parts of the application framework that dealt with SP access – stuff I had just recently removed 🙂

The SP’s related to the test-cases alone are pretty simple; they either insert some standard data into a couple of tables or deletes stuff… Porting them to PostgreSQL was pretty straightforward, except for one thing: they used a sys_guid() function to generate unique 32 character hexadecimal keys. Now, for this to work with the component, I did not need a “true” guid generator; a function that could generate reasonably unique random 32 character hexadecimal keys was sufficient:

CREATE OR REPLACE FUNCTION sys_guid() RETURNS CHAR(32) AS $BODY$
DECLARE 
	ret CHAR(32);
BEGIN
	SELECT UPPER(md5(now() || to_char(random(),'0.0000000000000'))) INTO ret;
	RETURN ret;
END;
$BODY$ LANGUAGE plpgsql volatile;

The one business related SP and some of the native queries used the Oracle trunc() funtion to extract the date (only, not time information) from a DATE or TIMESTAMP. PostgreSQL does not have similar trunc() function, so a user-defined function was created in PostgreSQL that would behave like Oracle-trunc() while given DATEs or TIMESAMPs as function arguments. Looks like this:

CREATE OR REPLACE FUNCTION trunc(TIMESTAMP WITH TIME zone) RETURNS DATE AS $BODY$
BEGIN
	RETURN date_trunc('day', $1);
END;
$BODY$ LANGUAGE 'plpgsql' IMMUTABLE STRICT;

Finally I had to work with the one business related SP left. It starts out filling some temporary tables with intermediate calculations. This is pretty simple with Oracle: the temporary tables are created as GLOBAL TEMPORARY by the schema DDL. With PostgreSQL, temporary tables need to be created in the session that needs them. This then has to be done in the SP. But, then we cannot call the SP more than once in a session. Or we need to create an exception handler in the SP…

Most of the PostgreSQL implementation of the SP is identical to the Oracle version, except for the initial temporary table creation:

BEGIN
	CREATE GLOBAL TEMPORARY TABLE TEMP_X (
 		ID CHAR(12) NOT NULL,
 		VALUE DECIMAL(12,2) NOT NULL
	) ON COMMIT DELETE ROWS;
 
	CREATE GLOBAL TEMPORARY TABLE TEMP_Y (
 		ID CHAR(12) NOT NULL,
 		Y_UUID CHAR(32) NOT NULL,
 		Y_ID VARCHAR(36) NOT NULL,
 		STATE VARCHAR(3),
 		Z_UUID CHAR(32) NOT NULL
	) ON COMMIT DELETE ROWS;
EXCEPTION WHEN duplicate_table THEN
-- ignore
END;

Native queries

All the native queries in the system are isolated into a native_queries.xml. I looked them over to see if perhaps there should be any oracle specific SQL. Except for the usage of the trunc() function to extract dates from timestamps, everything was pretty standard SQL and no translation was needed.

Now I ran all the tests again and only a few system-related tests failed. Great! The reason was that I still needed to change the Hibernate “dialect” to generate PostgreSQL specific native SQL instead of Oracle. When done, all tests ran without error!

Conclusion

The one single item that I spent most time on was creating the PostgreSQL schema DDL. And not only was this process time-consuming, it was also boring :-).

The porting of the stored procedures and infrastructure code was the second biggest item.

To summarize the findings:

  • Try to keep native features out of the schema DDL if possible. If you are going to support n databases, you will have to maintain n database specific DDL scripts!
  • Try to minimize the use of stored procedures. They can be difficult if not impossible to port. Infrastructure code to call the stored procedures is database dependent too.
  • If you have to use native queries, try to use as few database specific features as possible. If you are using “clean” SQL only they should run on any database.
  • If you have to resort to some “hack” to be able to call the database in a non-standard manner, isolate all the “hack” code in one place. Even if it is only used for testing!
  • Using a good ORM tool with its own database-independent query language helps a lot! None of the JPA-QL written in this project had to be touched for the code to work on PostgreSQL!

But all in all this has been a good experience. PostgreSQL is a great database and porting from Oracle has been pretty easy. With about 2 days work, all developers are now able to run disconnected with their own PostgreSQL instance. And while doing the port, we found several places in the application that needed improvement.

About Jesper Udby

I'm a freelance computer Geek living in Denmark with my wife and 3 kids. I've done professional software development since 1994 and JAVA development since 1998.
This entry was posted in Databases, Java and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.