Agile Build, CI and Testing Automation
5 Mar
It is just plain fun to run parallel workflows and builds and watch the activities and build steps light up the workflow monitor in real time like a Christmas tree. See this flash demo to see what I mean.
As customers go to machines with more and more cores, fewer machines are needed in the application lifecycle infrastructure, particularly for builds and code retrievals – the most resource intensive functions. This is helping to simplify the infrastructure, reduce maintenance and administration and drive down costs.
Several of our customers are running around 5000 builds and non-build workflows per month on two machines. The primary reason for two machines, in fact, is for disaster recovery, and the goal is to run both machines at less than half capacity so that in the event that one machine (or datacenter) fails, all the current capacity can be run as a contingency on the one machine that’s left.
Thread control is very simple with Meister and Mojo. Both use the omsubmit dependency manager program to handle this. Meister’s om program translates build events into workflow steps using omsubmit. The OMSUBMIT_MAX_USER_PROC value sets the maximum allowed number of threads.
You might think that if you are running dual, quad-core build machines that you should set the max threads at 8. However, Meister posts build operations to one thread and the associated logging operation to another. Compile operations notoriously use a lot of memory and CPU resources, but the logging operation posts to a server and waits for the operation to complete. There is really no disadvantage to setting the max threads higher than 16 in this case, so go ahead and do it.
As a non-build workflow example, I worked on JBoss deployments to 48 Linux machines. The workflow was parallelized into 48 activities each of which deployed to a single machine in parallel. The deployment activity was largely a remote execute operation that extracted archives on the remote machine. The extraction took about a second for a medium sized application. Again, this is a waiting situation where machine resources are essentially idle while the thread is in use, so use more threads. The machine was a dual, dual-core build machine and we set OMSUBMIT_MAX_USER_PROC to 50.
Watching the workflow monitor as the deployment ran, we could see roughly half of the machines light up (meaning actively running) at any one time and the entire deployment process synchronized all 48 machines in a little over two seconds.
So, don’t simply match your machine’s CPU threading capabilities – overclock! Aim high for max threads and try to determine where your performance is optimized. I’d love to provide you with some metrics as a function of thread count, but usually once something is working it’s on to the next project. I barely have enough time to blog!
17 Feb
It’s very common to have a code check-out step be part of an integration build. Far better it is to not check out code before a build. What? How is that possible?
Let me explain, Fred. The simple approach most of us take (and have to take when getting things started) has developers commit, commit, commit, and when it is time to deploy, check out the code, do a build, and then deploy the application. There is room here for both problems and optimizations. Doing a full check out of the code tree is more costly in terms of time than checking out only what has changed. Updating the code tree with a single commit is less costly than updating with a large number of commits.
You may be limited by the technology in-hand and how much you’ve invested in learning the technology and possibly customizing it. For example, if your file control tool can only do a full check out of a source tree, or that’s the only command you had time to implement in order to meet the deadline, or you don’t trust your tool to do incremental updates, then you are basically running the longest builds possible.
On the other hand, if you could update the code tree every time a developer does a commit with only the changed files, then you are ready to execute a build at any moment. This requires some deft manipulation of your file control tool, and that’s why you don’t see it more often.
You might think “continuous integration” will take care of this. Developer commits, update checked out, build is run. However, you may end up with a build, test execution and deployment that takes longer than the typical time between developer commits. You still have to do incremental updates and it only solves the problem in cases with very low developer activity.
I’d like to point out one tool that does an excellent job of post-commit code checkout, CA Software Change Manager for Distributed. CA SCM (for short) is the tool, formerly known as Harvest, from the company formerly known as Computer Associates. CA SCM is a highly scalable (1000’s of developers) file control tool with a great lifecycle process model. We at OpenMake Software still have our very first customer still using OpenMake/Meister with CA Harvest/SCM after 11 years. While we have a reseller arrangement with CA, our partnership with CA in services has extended to 14 years.
About 10 years ago, OpenMake Software developed an integration with the then, Platinum Technologies’ Harvest product, modeled after the now dead Computer Associates product, Endevor Workstation, that had an excellent post-action code tree update. (Endevor for z/OS, a.k.a CA SCM for Z/OS is still very popular and has a similar functionality called ‘output libraries’ – following all this?) Our integration had the horrific name, ‘Har-refresh’.
As product partners, we finally transitioned Har-refresh from an external add-on to CA who have turned it into a core functionality of the product, called Hrefresh (a better name.) Rather than simply a post-commit check out, HRefresh updates the code tree after any action that updates a dynamic code view. This includes, renames, deletions, commits and code promotions and demotions. We like this because CA SCM does all the work and we cherry-pick sets of up-to-date code trees to build up an application source code stack for a build. We align Meister dependency directories with HRefresh-managed file system directories for a tight SCM (software configuration management) build.
This mechanism distributes the resource load for checking out code to times when builds are not required. It’s true that often times people want to build as soon as their code is checked in (or promoted), but on average it is a very big net win reducing build times.
This is just an example of the type of sophistication that is out there to prevent pre-build code check outs and save time on your builds.
27 Jan
Finding the blog Enterprise Maven made me decide to go back to the basics, today. This blog is from 2006, but the best practices of production control ignored here go back decades. I’d like to point out that Oleg Gusakov, the author, wrote the blog in a very good spirit and seems like a nice guy. He just seems to be a bit naive about what’s been happening with software development in the enterprise.
In the first section, he assumes that the only enterprise build and deploy solution is one that is customized, while OpenMake Meister has been serving that role now for 12 years. He does correctly conclude that all the enterprises in the world should not be independently investing in the same type of build and deploy solution. It is a costly investment and this functionality should be productized. That’s exactly why we did it and why that is still one of our chief selling points.
He is right that developing a product that should be commoditized is a drain on the business. However, the converse, having a commercial product provide the functionality at a greatly reduced cost compared with one homegrown, provides a competitive advantage over those companies who don’t have such a product.
Through the middle of the article, again, I think Oleg is unaware of the heavy horse SCM products out there that provide a lot of the expected functionality. Tools like CA Harvest, Serena Dimensions and others are very complex and sophisticated n-tier products. They nevertheless do not provide build support, so by combining an enterprise file control tool with an enterprise build and workflow tool, Meister, you canvas the required functionality.
Lastly, regarding the enterprise development lifecycle, he is right it is an oversimplification. I like his phrase that he hopes to “grow the meat.” At OM Services, we have “fully grown meat” and the enterprise lifecycle documents that we develop with our customers and clients are typically 50-80 pages in length. Here is where I review the generally accepted best practices, going back to the seventies with mainframe development. (NO, distributed platforms are not somehow different in the high level process!)
Any type of continuous integration or agile development practice typically happens before the QA environment. Any develop methodology for the enterprise must take into account the fundamental conflict of interest between software change delivery and business continuity or ignore it and remain entirely in front of QA.
If you are a developer, you can think of this as a loss of privilege, or you can be elated that other people are doing the dirty work for you and you can focus on the art and science of engineering business solutions. If you are really depressed, maybe you should be on the other side of the wall!
31 Dec
As one who has done many version control tool A to version control tool B conversions, I know how difficult such a task is. That’s why I am all the more impressed that 20+ years of Perl history from multiple repositories have been converted to a single Git repository.
I can’t add much more about the benefits than the announcement itself:
3 Nov
I found myself actually using four different file control tool plug-ins in a single Eclipse 3.4 workspace. This is not show-off, but for legitimate needs. Before proceeding, let me disclaim that I am reorganizing my Perl development on a new machine and I have everything somewhat haphazardly in a single workspace. Ideally I will have different workspaces for different projects, but until I build a standard set of preference, particularly for EPIC Perl templates, and, I can export and import them into different workspaces, I’m locked into a single workspace for now.

If you are not familiar with Eclipse and version control (or as I call it generically “file control”) you have to install plug-ins that provide the functionality to interface with different tools. I have an EPIC plug-in that provides Perl tools, and I’ve installed EGIT for Git integration and plug-ins for Subversion and Bazaar. The CVS plug-in actually comes as part of the base Eclipse install, though that status is questionable given the popularity of Subversion and the rapid rise of Git.
These plug-ins provide the capability to create a new project from the contents of the file control repository, or attach an existing Eclipse project to a new project under file control. You do this by right-clicking on the project and going to the “Team” menu and the “Share” item.Here is a quick explanation of the screen shot above. “om64Perl” comes out of our OpenMake CVS repository. The ones attached to Git, are pretty obvious with the word “Git” clearly to the right of the project name. Being a distributed repository tool, the Git repository that the projects are attached to is actually in the workspace. Then, I have an anemic open source project on SourceForge to which the “PerlSCM” project is attached via Subversion. And, finally, there is the Perl VCI project “vci” that uses Bazaar.
There you go. Because I’m involved with three open source projects that use different file control tools, and regular work that uses another, I end up with four.
13 Oct
I’d like to see more users on Git and to do that it needs to have a robust Windows client like TortoiseCVS and TortoiseSVN. It turns out there is such a client, called Cheetah.
There is a UNIX-like runtime environment on Windows called MinGW and a minimum install set called msys. When you download Git to a Windows machine it includes msys and all the awesome *NIX commands and filters I can’t live without.The corresponding project page for Git running with msys is here: Git on Google Code.
In order to promote Git, I joined the msysGit Google group and offered my help to Johannes Schindelin. It looks like they might need some help managing requirements and bug tracking.
Also, weird karma – my first job after college was a summer job at theMax Planck Institutes and Johannes is there now.
10 Oct
I’m going to contribute my CA Harvest knowledge to the Perl VCI module. Max Alexander-Kanat, who runs that uses the bazaar code control tool for that. So far I haven’t used that one, but I’m all up for it.
I was wondering how many code control tools I’ve used. Here is a list and a tally:
SCCS, RCS, PVCS/Version Manager, Endevor Workstation (RIP), Endevor for UNIX (RIP), Endevor mainframe, CVS, Subversion, CA Harvest, MKS Source Integrity, Perforce, Git, Microsoft Visual Source Safe, ClearCase, StarTeam, Serena ChangeMan for Distributed Platforms (RIP), Serena Dimensions. Total 17 – only 17?
There are a couple more tools that I saw or downloaded, but did not actually use like Microsoft’s Team Foundation Server, IBM’s CMVC (nearly RIP) and Aldon’s Lifecycle Manager for AS/400.
5 Oct
I led a session at BarCamp Milwaukee this weekend on the Git code control tool. I prepared for a look-at-my-laptop presentation for the 4 people who signed up by Friday. At the appointed time about 30 people showed up to a room with no projector (about 1/4th of the conference attendees). Now, that’s the kind of thing to keep you on your toes!
Several of the developers knew the tool better than I did and so I became the discussion leader. We talked about the basics, distributed development, branching, the Eclipse plug-in and suitability for the enterprise (the verdict was “yes, it is”).
In general, a lot believe Git is superior to both CVS, Subversion and even ClearCase. Git has advantages in checkout speed, branch support and is better for supporting builds. It is fundamentally different in that it supports a distributed development model. But, it is similar to CVS and Subversion in that it is basically a command-line tool with little GUI support (compared with tools like Perforce, StarTeam and AccuRev) and lack of enterprise integration and reporting capabilities that high-end SCM tools have like Team Foundation Server, Serena Dimensions, IBM Jazz and CA Harvest.
There was also forklift driving and a build-and-take-home your own robot sessions there in addition to functional programming and PostgreSQL.
15 Jul
I’ve been considering the management of our services code under Git. It seems that the support of the distributed development model fits perfectly with sharing and developing code, mostly Perl, among multiple sites (consultants and/or customers). It allows us to keep a primary repository under our own control, but it also allows an on-site consultant to clone a repository and either enhance or customize or both while on site. After the consultant leaves, the customer would be able to choose to receive updates from our on-line repository on GitHub, for example, or not. They could also contribute enhancements, or not, and we can decide if we want to accept any changes they pushed, or not, or futz around with them first.
A consultant could make both enhancements and customizations and as long as they are in separate commits, we can cherry-pick the enhancement commits into our master branch. Pretty cool stuff.
Some of our customers have strict controls over what executables they allow to be installed on their machines, and they may not allow the Git executable client. However, one can clone a repository onto a USB drive and make modifications to the work tree there. This would appear no different than editing files outside of version control. After the edits are done, the USB key can be returned to a machine with a Git client, the changes added and then committed to the repository on the USB key. Those changes in turn could then be pushed to the on-line repository. So, a sort of open source development could be done without violating the customer’s security policies.
3 Jul
With the Web 2.0 evolution, information flow between people has changed from a ‘push’ paradigm (I send you an email) to a pull paradigm (I follow you on Twitter). How could this possibly relate to code management such as branching, merging and history? Well, Git’s distributed repository model and how one obtains code updates from “friend” repositories is similar to Twitter and how you obtain status updates on the people you choose to follow. Instead of communicating micro-blog entries or status updates, Git is communicating source code branch updates.
Also like how Facebook or Twitter allows you to specify a person’s name in lieu of the communication protocol identifier (email address or web page), Git uses aliases for long repository locations so you have a more direct, natural language and human feel to what you are doing: “git fetch linus” will pull changes from Linus’ repository, which you have only had to define once.
Here is a scenario where Steve and I are working on a part of the Linux file system to provide information useful for build management and dependency tracking, which Meister and other tools can take advantage of. Steve started by cloning the master Linux repository and started working away making changes. Steve asked me to work on another part of this project, so I cloned his repository, allowing me to pick up all his changes. I am now automatically following (Git calls it remote-tracking) Steve’s “master” branch of his repository since I started my repository by cloning his. The “master” branch is a.k.a. the “trunk” code stream. I can pick up his updates periodically with:
$ git pull
Now, I may also want to get updates directly from the master Linux repository, but it has a complicated URL that I won’t remember and only want to look up once. So, as a one-time command I do:
$ git remote add linux-nfs git://linux-nfs.org/pub/nfs-2.6.git
Forever after:
$ git fetch linux-nfs
* refs/remotes/linux-nfs/master: storing branch 'master' ...
commit: bf81b46
The “fetch” command doesn’t put the master Linux changes directly into my workspace, but off to the side for me to examine first (very nice). If I want, I can accept the changes into my local work tree. To tell me which repositories I am following (which friends), I do:
$ git branch –r
linux-nfs/master
steve/master
origin/master
“origin/master” is my own trunk. I could also get the full repository information associated with the short names, but as long as it works, I don’t want to know what it is. For me, this type of friendly and fluid interaction with repositories is one of the major advantages over CVS and Subversion.