Sean Blanton

Agile Build, CI and Testing Automation

Archive for the ‘Build Tools’ Category

There has been some talk around using physical, colored, lamps to notify build results (pragmatic automation, Carlos Sanchez, Richard Durnall, Alberto Savoia).

The idea comes from Lean Manufacturing principles developed by Toyota applied to software development. An “andon” paper lamp was a way to let other workers know there was a problem on the assembly line. The idea for builds is that while a build is running, an amber light flashes, letting other developers know not to run a build, a green light signifies the last build was completed successfully, and a red light means the build is broken and needs to be fixed.

While I’m sure it is an entertaining exercise to set this up, some simple software automation could achieve the same results, and it has obvious faults, like what if I am working from home?

Email is the de facto standard for build notification. There is a hardcoded mailing list somewhere and you get the email, even if you don’t want it, and you have to go and check your mail to receive it.

Twitter, on the other hand, puts the user in control of the notifications in true Web 2.0 fashion. Tweets are also ideal for receiving notifications via cell phone, where email is generally not. Consider the following cases:

  • I can follow the build when I want, and unfollow when I want. No going to an admin to remove me from the list.
  • I can turn on device updates so I get an SMS message about the result (maybe an important release is coming up), or I can turn off device updates, say, when I’m on vacation.
  • I can retweet (forward) the build notification.
  • You can put links in the message. Meister uploads all build logs in HTML format to a web server and we would include the link to that.
    About a year ago, I used Meister with a post-build activity to Tweet the build result after a build. The activity was a simple Perl script using Net::Twitter. Let’s just say this experiment sort of fell on deaf ears.

Now I’m ready to roll with this again and you can follow my progress. I had created a Twitter account, @builds, for this purpose. In a day or two, you’ll see some tweets from that guy from one of my builds.

I’m interested in what other people have to say. Meister already uses Eclipse RCP as it’s front end, but most other build tools are still command-line and in the 1970’s. Let’s bring build management tools into the 21st century with Web 2.0 features.

With our third nomination, we won the Jolt Award for product excellences in the change and configuration management category. You can view the announcement here.

It is just plain fun to run parallel workflows and builds and watch the activities and build steps light up the workflow monitor in real time like a Christmas tree. See this flash demo to see what I mean.

As customers go to machines with more and more cores, fewer machines are needed in the application lifecycle infrastructure, particularly for builds and code retrievals – the most resource intensive functions. This is helping to simplify the infrastructure, reduce maintenance and administration and drive down costs.

Several of our customers are running around 5000 builds and non-build workflows per month on two machines. The primary reason for two machines, in fact, is for disaster recovery, and the goal is to run both machines at less than half capacity so that in the event that one machine (or datacenter) fails, all the current capacity can be run as a contingency on the one machine that’s left.

Thread control is very simple with Meister and Mojo. Both use the omsubmit dependency manager program to handle this. Meister’s om program translates build events into workflow steps using omsubmit. The OMSUBMIT_MAX_USER_PROC value sets the maximum allowed number of threads.

You might think that if you are running dual, quad-core build machines that you should set the max threads at 8. However, Meister posts build operations to one thread and the associated logging operation to another. Compile operations notoriously use a lot of memory and CPU resources, but the logging operation posts to a server and waits for the operation to complete. There is really no disadvantage to setting the max threads higher than 16 in this case, so go ahead and do it.

As a non-build workflow example, I worked on JBoss deployments to 48 Linux machines. The workflow was parallelized into 48 activities each of which deployed to a single machine in parallel. The deployment activity was largely a remote execute operation that extracted archives on the remote machine. The extraction took about a second for a medium sized application. Again, this is a waiting situation where machine resources are essentially idle while the thread is in use, so use more threads. The machine was a dual, dual-core build machine and we set OMSUBMIT_MAX_USER_PROC to 50.

Watching the workflow monitor as the deployment ran, we could see roughly half of the machines light up (meaning actively running) at any one time and the entire deployment process synchronized all 48 machines in a little over two seconds.

So, don’t simply match your machine’s CPU threading capabilities – overclock! Aim high for max threads and try to determine where your performance is optimized. I’d love to provide you with some metrics as a function of thread count, but usually once something is working it’s on to the next project. I barely have enough time to blog!

It’s very common to have a code check-out step be part of an integration build. Far better it is to not check out code before a build. What? How is that possible?

Let me explain, Fred. The simple approach most of us take (and have to take when getting things started) has developers commit, commit, commit, and when it is time to deploy, check out the code, do a build, and then deploy the application. There is room here for both problems and optimizations. Doing a full check out of the code tree is more costly in terms of time than checking out only what has changed. Updating the code tree with a single commit is less costly than updating with a large number of commits.

You may be limited by the technology in-hand and how much you’ve invested in learning the technology and possibly customizing it. For example, if your file control tool can only do a full check out of a source tree, or that’s the only command you had time to implement in order to meet the deadline, or you don’t trust your tool to do incremental updates, then you are basically running the longest builds possible.

On the other hand, if you could update the code tree every time a developer does a commit with only the changed files, then you are ready to execute a build at any moment. This requires some deft manipulation of your file control tool, and that’s why you don’t see it more often.

You might think “continuous integration” will take care of this. Developer commits, update checked out, build is run. However, you may end up with a build, test execution and deployment that takes longer than the typical time between developer commits. You still have to do incremental updates and it only solves the problem in cases with very low developer activity.

I’d like to point out one tool that does an excellent job of post-commit code checkout, CA Software Change Manager for Distributed. CA SCM (for short) is the tool, formerly known as Harvest, from the company formerly known as Computer Associates. CA SCM is a highly scalable (1000’s of developers) file control tool with a great lifecycle process model. We at OpenMake Software still have our very first customer still using OpenMake/Meister with CA Harvest/SCM after 11 years. While we have a reseller arrangement with CA, our partnership with CA in services has extended to 14 years.

About 10 years ago, OpenMake Software developed an integration with the then, Platinum Technologies’ Harvest product, modeled after the now dead Computer Associates product, Endevor Workstation, that had an excellent post-action code tree update. (Endevor for z/OS, a.k.a CA SCM for Z/OS is still very popular and has a similar functionality called ‘output libraries’ – following all this?) Our integration had the horrific name, ‘Har-refresh’.

As product partners, we finally transitioned Har-refresh from an external add-on to CA who have turned it into a core functionality of the product, called Hrefresh (a better name.) Rather than simply a post-commit check out, HRefresh updates the code tree after any action that updates a dynamic code view. This includes, renames, deletions, commits and code promotions and demotions. We like this because CA SCM does all the work and we cherry-pick sets of up-to-date code trees to build up an application source code stack for a build. We align Meister dependency directories with HRefresh-managed file system directories for a tight SCM (software configuration management) build.

This mechanism distributes the resource load for checking out code to times when builds are not required. It’s true that often times people want to build as soon as their code is checked in (or promoted), but on average it is a very big net win reducing build times.

This is just an example of the type of sophistication that is out there to prevent pre-build code check outs and save time on your builds.

One of OpenMake Software’s product strategies is to keep things simple. Build management is one of the most complex operations in all of the IT world, and one of our key benefits is to simplify, organize and automate the build process for development, testing and production.

We’ve seen a trend among our customers to simplify their build management infrastructure by going to fewer build machines with more CPU cores. Builds in particular use relatively more CPU resources than other resources as code is interpreted and compiled in memory and then finally written to disk. By reducing the total number of machines, rack space, procurement, administration and other IT overhead costs are reduced at great cost savings per machine eliminated.

Recently, I was at one of the big chip makers where they used dual quad-core CPU Linux machines for their development and builds. They had two machines and were able to control access to allow separate areas for development, testing and release builds in keeping with best practices. Having all the horsepower of 8 CPU cores on a single machine kept them from needing more machines.

Another customer does 6000 builds per month with Meister on just two build machines.

IBM, when selling BuildForge, likes to talk about big build server farms, because their tool does remote execution on multiple machines, as does Meister. However, BuildForge does not do builds at all. It can remotely execute your existing build scripts, but there is little real value add to that. BuildForge is also famously expensive. What happens over the next few years to the high investment in multi-machine remote execution software as the number of machines declines, perhaps dramatically?

A similar argument can be for Electric Cloud’s Electric Accelerator product. It’s possible in some cases, for C/C++ builds to gain an edge by pushing a compile operation to another machine, and then bringing it back. You would only do this to gain access to additional CPU resources. In the past, you might have 8 build machines that Electric Accelerator would farm operations out to. Now, you can pull all those operations into a single machine and there is no need for that functionality. Also, you are stuck with converting your GNU makefiles into other GNU makefiles.

Meister is optimized for multi-core CPU build machines and offers multi-threaded capability to both build events and non-build workflow events. You know where your build is and there are fewer dependencies on network resources. Both BuildForge and Electric Accelerator add additional overhead to build administration to coordinate across multiple machines – a dying practice, that no organization wants to invest in. Meister is the best bet for a future with fewer build machines with more horsepower.

Finding the blog Enterprise Maven made me decide to go back to the basics, today. This blog is from 2006, but the best practices of production control ignored here go back decades. I’d like to point out that Oleg Gusakov, the author, wrote the blog in a very good spirit and seems like a nice guy. He just seems to be a bit naive about what’s been happening with software development in the enterprise.

In the first section, he assumes that the only enterprise build and deploy solution is one that is customized, while OpenMake Meister has been serving that role now for 12 years. He does correctly conclude that all the enterprises in the world should not be independently investing in the same type of build and deploy solution. It is a costly investment and this functionality should be productized. That’s exactly why we did it and why that is still one of our chief selling points.

He is right that developing a product that should be commoditized is a drain on the business. However, the converse, having a commercial product provide the functionality at a greatly reduced cost compared with one homegrown, provides a competitive advantage over those companies who don’t have such a product.

Through the middle of the article, again, I think Oleg is unaware of the heavy horse SCM products out there that provide a lot of the expected functionality. Tools like CA Harvest, Serena Dimensions and others are very complex and sophisticated n-tier products. They nevertheless do not provide build support, so by combining an enterprise file control tool with an enterprise build and workflow tool, Meister, you canvas the required functionality.

Lastly, regarding the enterprise development lifecycle, he is right it is an oversimplification. I like his phrase that he hopes to “grow the meat.” At OM Services, we have “fully grown meat” and the enterprise lifecycle documents that we develop with our customers and clients are typically 50-80 pages in length. Here is where I review the generally accepted best practices, going back to the seventies with mainframe development. (NO, distributed platforms are not somehow different in the high level process!)

  1. It all starts with production control. Developers do not have access to production due to a fundamental conflict of interest. Maintaining business continuity trumps developers’ ease of delivery to production.
  2. Since someone else puts the code into production (or operates a tool which does so), this is the basis for separation of roles and responsibilities in the enterprise software development lifecycle.
  3. To ensure integrity of the production environment, the production build must be done by a group representing the business, not development. Developers do not do production builds.
  4. Working backwards, if you want your test environment to be as close as possible to production, you lock this down and prevent access to developers. This is usually the QA testing environment.
  5. Again, to avoid a conflict of interest, the QA testers should be working for the business, not the application development team.
  6. And it follows that the build for the QA environment is done by the business.
  7. The developers job from this perspective is delivering source code to the business, which wants retain the source code and the ability to use it (meaning they can build it).
  8. The process of developers transferring source code to the QA build people was called “throwing it over the wall”. Now, the heavy horse SCM tools and Meister workflow make it easy to do this and allows variations of iterative development involving the QA environment.

Any type of continuous integration or agile development practice typically happens before the QA environment. Any develop methodology for the enterprise must take into account the fundamental conflict of interest between software change delivery and business continuity or ignore it and remain entirely in front of QA.

If you are a developer, you can think of this as a loss of privilege, or you can be elated that other people are doing the dirty work for you and you can focus on the art and science of engineering business solutions. If you are really depressed, maybe you should be on the other side of the wall!

Mojo and Meister 7.2.1 On The Way

We’ve been testing Mojo and Meister 7.2.1 and getting ready for their release on December 15. This is a maintenance release with bug fixes from users who’ve started running builds and workflows with Meister 7.2 and using the free workflow automation of Mojo and putting those releases through their paces. It also contains a lot of UI and documentation improvements.

Existing users of the 7.2 version of either Mojo or Meister will be able to upgrade via the update sites, http://www.openmakesoftware.com/mojo/update_site and http://www.openmakesoftware.com/meister/update_site, respectively.

Users interested in getting Mojo, the free workflow automation tool, and Meister, the industry leading build automation tool, can find download instructions on our website, http:///www.openmakesoftware.com.

We’ve heard from a number of companies that they are having problems with their WebSphere Eclipse headless builds and they are looking to us for a solution.

A headless build is an Eclipse function (and therefore an IBM WSAD/RAD IDE function) that allows you to execute Eclipse and some of it’s plug-ins at the command line. The GUI does not launch, but the main machinery does, and it makes use of the workspace metadata.

In theory this allows you to do the same build at the command line that was done in the GUI. It’s a nice theory that we subscribe to ourselves.

In practice, these companies do not feel they are successful for a number of reasons:

  1. If it works it is slow. A slow build is almost as bad as a broken build.
  2. Unexplained crashes in the middle of the build due to workspace corruption.
  3. Crashes may leave the .lock file in the .metadata folder of the workspace. This prevents Eclipse from executing in the same location until it is cleaned up.
  4. It’s a black-box build with a large number of operation swept

Meister takes a different approach and records the files and project types and then applies a traditional build with a mixture of Ant tasks and command line calls. This eliminates the need to even install RAD 7 on the build server, but you still need the runtime libraries.

I’ve worked with the IBM WebSphere EJBDeploy plug-ins and integrated Meister to actually call an Eclipse headless build via the WSAD/RAD ejbdeploy.bat file. As a rule, we always stick with the vendor-recommended approach, so we don’t have a choice. However, we don’t do the compiling and archive this way – just the EJBDeploy step.

By compiling and archiving the Java parts in a traditional way, we at least minimize the risk of a broken headless build and maximize the speed. And, yes the ejbdeploy step is usually the longest step in the build. The fact that IBM supplies the ejbdeploy.bat file at all, which mimics a command-line compiler, means we’re not the only ones taking this approach.

cpmake – One-Off Make Variant

On a related note, people are still having trouble with make. Here’s a note about case preservation problems when doing cross-platform builds. Well, the problem is implied, given that this is a post describing a solution.

Given that this functionality has been missing from make for decades of doing cross-platform builds, I have to question if this is really necessary. Is the root cause of the problem something else?

Here comes buildr: yet another Java build tool. Hopefully I, or one of my other cohorts will check this out in detail soon. But, with my experience working with all manner of build tools, with 100 companies and many more development teams, I can already make a few observations.

First of all, why another build tool for Java? I am occasionally told that Maven or Ant is a perfect tool, but clearly the people behind buildr don’t think so. The choice of JRuby as the vehicle for delivering this tool, I think is probably a good one. JRuby is a scripting language in the same vain as Perl, which is used by Meister.

Doing software builds is an ugly business involving lots of file and operating system interaction. This is not where Java shines, but scripting languages can. As long as operating systems are written in C and not Java, C-like tools will be better and faster at interacting with them. Plain Ruby itself is C-based, and JRuby no doubt inherits C-like operating traits. Calling out to a Java compiler from Perl or JRuby, though it has its own JVM, does not represent a significant overhead compared with the file system operations and the compilation/translation itself.

Both Maven and Ant are relatively difficult to extend compared most other build tools, and I’ll be buildr beats them here. If you have all the Maven plug-ins and Ant tasks you need, then good for you. If not, then you have to start developing in Java and it becomes too much of an investment to sink into a build system. It is much cheaper to extend in JRuby or Perl. My frequently cited example is the XMLBeans compile step in Meister, written in Perl, which is only 40 lines of real code. The Maven plug-in is 60 pages of Java code and no one can tell me really what it is doing (I asked on all the forums). Less code is usually more transparent, which is also good for build audits.

I am a little disappointed to see them try to placate the Maven and Ant users by promising it is a drop-in replacement for Maven and they have all the Ant tasks covered. Both tools have their drawbacks and I don’t want to see another tool with the same deficiencies. They should have the cajones (or coñejos) to apply all their resources to what they think is a better tool (with its own unique benefits and deficiencies). I imagine offering Ant task equivalents is pretty easy because of the ease of coding in JRuby compared with Java.

They also don’t mention who is supposed to use the tool. Is it for individuals, small development teams, the enterprise? Maven falls short because it is only appropriate for development teams and not for stable, controlled, enterprise builds. Ant is not even tool, but a means to create some tools for small teams. I don’t think Meister will fear buildr either.

Well, since buildr is only in incubation status with Apache, I’m not sure how much time I’ll be able to spend on it, but I am curious and I’ll let you know if I find out more.