Agile Build, CI and Testing Automation
25 Aug
Databases also have source code that is compiled. While there are clear differences between compiling for a database and compiling binaries for a hardware/operating system combo or virtual machine, there are many similarities.
The clearest difference between a database build and a C or Java build is that there are no easy to identify output files. With C or Java, you end up with new files that you normally transfer to a runtime environment. With databases, it is usually not obvious what is happening after you make your changes.
But databases do have source code. The first step is to recognize that DDL or Oracle PL/SQL is every bit as much source code as C or Java source. This source should be managed in version control and subject to the same life cycle management and controls as any other application source.
The next step is to think of the SQL/DDL/PL interpreter as a compiler. Oracle’s interpreter is sqlplus, while both Sybase and Microsoft SQL Server have a program called isql.
Now a trick is required. By using this trick, one can easily manage database builds with traditional file-based build systems. The concept behind the trick is to use a proxy file. A proxy file is a file that represents the output of a database compile in the build area, even though the database interpreter itself did not create one. Let’s say you want to make a change to a database table by applying an ‘ALTER’ statement. Put the statement in its own file. This is the source code. Now operate on the source code by passing it to the interpreter in the same way you would pass C or Java code to a compiler. Whereas the compilers create some kind of output files, you have to create your own fake output file. But this file will have the timestamp of the compile operation, which is also (approximately) the last modified time of the table. You can now manage database changes in a traditional incremental make system.
With make, you can perform an incremental build by assuming that output files have later modification times than the source code if the build is up to date. If a source file is found to be modified after an output file, make can execute the minimum amount of necessary commands to rebuild any output files depending on the newer source.
You can get fancy pretty quickly from here. Most sophisticated database systems keep track of the last modified time of each table. Instead of pulling your own timestamp from the operating system you could query the database after you apply an operation for its timestamp value and use that for the modification time of the proxy file.
If your proxy files serve as a representation of a table or a view, you can even create a full set of proxy files without any source code as your starting point. This gives you an audit point if you have a secure file system for your build area. You can create a set of proxy files initially and then later compare the database timestamps with the proxy file timestamps to see if any alterations were made outside of the build system.
We’ve adapted the OpenMake product to manage database changes for several companies. OpenMake’s dependency management really helps to apply database changes in the correct order, insuring ALTER statements are applied before a procedure definition, for example. The make‘ish timestamp checking of OpenMake allows for incremental changes. This is handy if something breaks half way through. If you’ve applied your CREATE statements and something breaks afterwards, you know full well, re-applying the CREATE statements will produce an error message. This is not a real error, because the table is already created as part of this change. An incremental system, possibly with proxy files, will know that the CREATE statements completed successfully and there is no reason re-apply them. The incremental system will pick up where it left off, so that the source code can be corrected, re-introduced to the build area and then the changes applied from the point they were last successful.
Having an incremental system for making database changes allows only real errors to come through and removes the intelligent person from having to interpret error messages and determine if they are real or not. This allows for a much greater level of change automation and one reason I am eager to help develop this part of the product.
Leave a reply