Best Practices and Technology in Software Delivery
28 Jan
I enjoyed the Windy City Perl Mongers meeting last night. I met a lot of neat people and heard a great talk on CouchDB and one on an AI aggregator. A diverse group of people showed up with some serious technical knowledge and skills. Iraq veteran and author of many Perl books, Brian D Foy was there. They were appreciative of OpenMake Software’s sponsorship of YAPC NA last year.
Josh McAdams organized and hosted the meeting at the Chicago Google offices, where free beer, wine and milk were available in the fridge. Afterwards, we went to Rock Bottom Brewery and lasted until after midnight.
26 Dec
OK, this topic might be a snoozer, but if we’re going to do build management for our RDBMS (Oracle, SQL Server, etc.) in a revolutionary new way, we need to compare what’s going on with database source code changes and builds and compare that with what we already know.
We said that when we make a runtime change to a database, we are only applying changes on top of what we already have, but in J2EE for Java, we replace the entire running application with another instance of the entire application. This is not an incremental deployment in any sense.
If we compare with the case for C/C++, our source code change might result in replacing one of the application’s executables with a new one. OK, this is a more incremental, but maybe I changed one C source file, resulting in one object file change. I still have to rebuild a new executable with possibly many additional unchanged object files.
For both Java and C, traditional build management technologies allow for incremental builds. That means, if I only change a subset of the source code, a build can be done that re-compiles the minimum number of files, taking into account the full impact of each file change. (Many applications have lost the ability to do incremental builds, but Meister can get it back.) So, for Java and C, there should be the ability to do an incremental build, but when that build step is complete, the runtime environment remains unchanged. A separate deployment step needs to happen which is less incremental to some degree.
For the database changes, there is only a single step combining both build and “deployment” and it is always incremental. When you do the build, the runtime environment is changed immediately. So, as I mentioned earlier, there are differences in build management for RDBMS’s and operating system/JVM applications. Again, let that not deter us from bringing those changes under the umbrella of a common build management system.
20 Dec
The majority of the time database changes (regardless of whether you are using Oracle, SQLServer, Sybase, DB2, MySQL or something else) coincide with application changes in a typical business application. For example, an application change request indicates that the Java application needs to start using a new column in an existing database. So, in order for the new version of the application to function correctly, you need to alter the existing database table to add the new column. There are some fundamental differences in how the two changes are manifested.
For the Java application, deployed to WebSphere, JBoss or other app server, let’s say you started out with a single Java class, foo.java, and this application change requires you to add a second class, bar.java. Typically, I would recompile both classes and bundle them together in an archive (a ZIP file) and ship the archive out to the runtime environment. This has the effect of replacing foo.java, whether or not it has changed along with adding bar.java. For the database change, however, you can only apply the ALTER statement to add the column to the table. You do not re-issue a command to recreate that table. Even if you issued commands to drop the table and recreate it, before applying the ALTER, you would lose all the data in the table (unless you dumped the data before hand and re-imported it).
Following J2EE standards for Java development, the whole application is completely replaced every single time it is changed, while for the database, only the changes needed are applied to the existing configuration. In other language, for Java, the existing runtime configuration is completely replaced, while for the database only a configuration delta is applied.
So, there are some significant differences between Java and database changes, but that doesn’t stop you from managing database changes effectively.
18 Dec
In many ways, databases are runtime systems comparable to other programmable environments such as operating systems and application servers based on Java virtual machines. All environments typically have their own specialty engineering support teams in larger companies and their own preferred programming languages for making changes: Java for application server environments such as WebSphere and JBoss, C/C++/.NET for the Linux, UNIX and Windows operating systems; and, PL/SQL/DDL for database changes.
While there are all sorts of tools that allow directly changing databases through an IDE (Interactive Development Environment), and even versioning changes, one trend has become clear in corporate environments over the last decade. That is the desire to manage database changes within their existing change and configuration management infrastructure. Going forward in this blog, I’ll address some of the challenges both organizationally and technically with doing this. The good news is that, yes it can be and is being done successfully. I will also address how Meister contributes to database change control and integrates database changes with the organization’s overall build management practice.
25 Aug
Databases also have source code that is compiled. While there are clear differences between compiling for a database and compiling binaries for a hardware/operating system combo or virtual machine, there are many similarities.
The clearest difference between a database build and a C or Java build is that there are no easy to identify output files. With C or Java, you end up with new files that you normally transfer to a runtime environment. With databases, it is usually not obvious what is happening after you make your changes.
But databases do have source code. The first step is to recognize that DDL or Oracle PL/SQL is every bit as much source code as C or Java source. This source should be managed in version control and subject to the same life cycle management and controls as any other application source.
The next step is to think of the SQL/DDL/PL interpreter as a compiler. Oracle’s interpreter is sqlplus, while both Sybase and Microsoft SQL Server have a program called isql.
Now a trick is required. By using this trick, one can easily manage database builds with traditional file-based build systems. The concept behind the trick is to use a proxy file. A proxy file is a file that represents the output of a database compile in the build area, even though the database interpreter itself did not create one. Let’s say you want to make a change to a database table by applying an ‘ALTER’ statement. Put the statement in its own file. This is the source code. Now operate on the source code by passing it to the interpreter in the same way you would pass C or Java code to a compiler. Whereas the compilers create some kind of output files, you have to create your own fake output file. But this file will have the timestamp of the compile operation, which is also (approximately) the last modified time of the table. You can now manage database changes in a traditional incremental make system.
With make, you can perform an incremental build by assuming that output files have later modification times than the source code if the build is up to date. If a source file is found to be modified after an output file, make can execute the minimum amount of necessary commands to rebuild any output files depending on the newer source.
You can get fancy pretty quickly from here. Most sophisticated database systems keep track of the last modified time of each table. Instead of pulling your own timestamp from the operating system you could query the database after you apply an operation for its timestamp value and use that for the modification time of the proxy file.
If your proxy files serve as a representation of a table or a view, you can even create a full set of proxy files without any source code as your starting point. This gives you an audit point if you have a secure file system for your build area. You can create a set of proxy files initially and then later compare the database timestamps with the proxy file timestamps to see if any alterations were made outside of the build system.
We’ve adapted the OpenMake product to manage database changes for several companies. OpenMake’s dependency management really helps to apply database changes in the correct order, insuring ALTER statements are applied before a procedure definition, for example. The make‘ish timestamp checking of OpenMake allows for incremental changes. This is handy if something breaks half way through. If you’ve applied your CREATE statements and something breaks afterwards, you know full well, re-applying the CREATE statements will produce an error message. This is not a real error, because the table is already created as part of this change. An incremental system, possibly with proxy files, will know that the CREATE statements completed successfully and there is no reason re-apply them. The incremental system will pick up where it left off, so that the source code can be corrected, re-introduced to the build area and then the changes applied from the point they were last successful.
Having an incremental system for making database changes allows only real errors to come through and removes the intelligent person from having to interpret error messages and determine if they are real or not. This allows for a much greater level of change automation and one reason I am eager to help develop this part of the product.