Sean Blanton

Agile Build, CI and Testing Automation

Archive for the ‘SCM Strategy’ Category

I attended a lunchtime set of lightning talks at ThoughtWorks yesterday (11/11/09) in downtown Chicago. It was great, but surprisingly they had trouble nailing down compelling arguments for adopting Agile practices. I wanted to help them by adding a business perspective to “trust me, you’ll like it better,” and summarizing a few other topics of the day interpreting them in terms of my own experience.

They also dispelled the belief by some audience members that Agile is a radical new way of doing things, and it correctly came out that many practices were developed previously, even on the mainframe. I am not taking anything away from the leaders in the Agile community who have popularized many best practices, encouraged collective adoption and developed specific ways of incorporating them into coherent software engineering frameworks.

I realized I myself ended up doing a variant of extreme programming (XP) without knowing anything about it through application of common sense optimizations and best practices. Hopefully my notes here will help people understand why certain engineering practices are advocated by XP. One day I ended up being a development lead with no knowledge and a blank slate and this is what happened…

Code Quality Beyond – “It Works”

Erik Dornenberg mentioned that code should be easy to change and ThoughtWorks CEO Becky Parsons mentioned that you show pay attention to testing when writing code. My own mantra is “good quality code should be easy to change, easy to test and easy to debug.” The relative priorities of each is determined by business needs.

Test Driven Development

In my environment, I often was given requirements that were simple technical specifications. For example, I had to ensure that some value followed a naming convention. The best way to test that the code met the specification was to encode the specifications into the unit tests and then write the code so that it passes the tests. How else would you do it? The optimization here is that I didn’t want to waste my time writing code that didn’t pass a unit test, so I wrote the test first. Possibly I found this out by trial-and-error.

Automated Testing

Duh – I had 300 unit tests for my codebase. Am I going to run each of them manually? We did have sufficient complexity that I worried that one change might break something seemingly unrelated, so I made sure all unit tests were run before promoting the code for deployment. So, no changes could be deployed without all unit tests being run against the code.

Continuous Integration

I’ve also done a lot of consulting in the application lifecycle management space and, we didn’t get too deep into developer practices. But, since we were implementing version control with automated build, we always introduced continuous integration – we just didn’t have that nice tidy term. One day, I hope to thank Martin Fowler (who gave one of the lightning talks) for popularizing/coining the term so I don’t have to fumble with a longer explanation. Unfortunately, when I now say “continuous integration,” I’m met with a blank or embarrassed stare (“I feel I should know what that is, but don’t”) so I still have to go through the full explanation.

On my own team, I wanted to know about integration problems as soon as possible and I wanted to ensure unit tests were written for code changes by other developers. So, I had unit tests automatically run prior to code check-in. A failing or missing unit test meant the code could not be checked in. Well, I lost that battle and had to relax the enforcement to mere warnings. I even had a unit test generator to make it easy to pass.

BTW, this was all Perl and you can have integration issues there just as in C# or Java. Integration problems often have to do with namespace and interface changes, and those issues come up in Perl also. Perl also gets compiled prior to runtime, so you can also break the Perl build. There is just not a separately executed build step (ok, ‘perl –c’ – maybe I’ll try that next time).

Pair Programming

We sort of half-implemented pair programming and code reviews because the engineers were not familiar with Perl best practice coding. They were also not familiar with the architecture and standard patterns I invented and had in my head alone. Yes, another nasty real world situation. Had I introduced actual “pair programming” it would have been better than me looking over someone’s shoulder and trying to convince them to use best practices to no avail.

Had I introduced actual pair programming, I would have had to immediately answer to the project manager who would want to know why it was taking two people to make each change instead of one (sure there are twice as many changes, but…). This was the hottest topic of the day at ThoughtWorks, but no one in the audience articulated the question that succinctly.

Neal Ford in his talk, told us that, with pair programming, it took 15% longer to code the changes, but it resulted in 15% fewer defects. Then came “trust us, that’s better.” You were almost there Neal. We need cost and timeline justifications for the management to accept pair programming. The one thing I’ll add to the below analysis is that probably most of the 15% defects would get caught in pre-release testing for non-pair programming, but let’s say one would get through to production.

Cost Justification for Pair Programming

Find out the cost of a production defect. At the low end, it is a few $10k’s, at the high end it’s millions. Let’s say with pair programming, we prevent one defect that would cost $50k plus a lot of stress.

Then you have on top of that the cost of finding the other defects in QA, repairing them and possibly impacting your timeline. That’s got to be worth at least $50k for a typical business app.

Now compare with the 15% extra time it takes to make the changes in pair programming. That’s 15% extra time for two programmers over let’s say a take a two-week iteration. One of our customers uses $65/hr to estimate the cost of salary AND all the IT resources consumed by a developer, so let’s use that. The cost of extra development time is:

80 hrs/developer * 2 developers * 15% * $65/hr = $1560

That is nothing compared with $100k worth of problems and sill worth it if I’ve wildly overestimated the cost of the defects.

Timeline Justification for Pair Programming

To calculate the cost of the extra test and repair work, you really needed to know how long all that would take and what the extra effort is because time is money. Al l you have to do is take 15% of 80 hours (12) and compare that to the time it takes to test, repair and re-test. It is probably worth investing the extra time time for pair programming.

Interleaving Design and Coding

When I became a dev lead, I was supposed to do design and let the junior members code, but I could not do one without the other. I couldn’t articulate why, but I did both and it worked out. Now, we hear that is a best practice, and to wait as long as possible before doing the design and design while coding. I’d still like a well-articulated argument about why it works.

Effective Communication

Communication breakdowns are a common reason for project failures. Features of Agile methodologies such as the stand-up meeting, well-defined artifacts and having testers and business analysts interacting very closely with developers are ways to facilitate communication. Discover (the credit card company) has a Discover Lean initiative that puts business analysts and developers in the same room, accomplishing the same thing.

Summary

I was also happy to hear the experienced ThoughtWorks folks espouse an adopt-what-works methodology rather than follow a specific methodology religiously. That is very good advice in my view. After all, I was very successful with IXP – Ignorant Extreme Programming.

It is just plain fun to run parallel workflows and builds and watch the activities and build steps light up the workflow monitor in real time like a Christmas tree. See this flash demo to see what I mean.

As customers go to machines with more and more cores, fewer machines are needed in the application lifecycle infrastructure, particularly for builds and code retrievals – the most resource intensive functions. This is helping to simplify the infrastructure, reduce maintenance and administration and drive down costs.

Several of our customers are running around 5000 builds and non-build workflows per month on two machines. The primary reason for two machines, in fact, is for disaster recovery, and the goal is to run both machines at less than half capacity so that in the event that one machine (or datacenter) fails, all the current capacity can be run as a contingency on the one machine that’s left.

Thread control is very simple with Meister and Mojo. Both use the omsubmit dependency manager program to handle this. Meister’s om program translates build events into workflow steps using omsubmit. The OMSUBMIT_MAX_USER_PROC value sets the maximum allowed number of threads.

You might think that if you are running dual, quad-core build machines that you should set the max threads at 8. However, Meister posts build operations to one thread and the associated logging operation to another. Compile operations notoriously use a lot of memory and CPU resources, but the logging operation posts to a server and waits for the operation to complete. There is really no disadvantage to setting the max threads higher than 16 in this case, so go ahead and do it.

As a non-build workflow example, I worked on JBoss deployments to 48 Linux machines. The workflow was parallelized into 48 activities each of which deployed to a single machine in parallel. The deployment activity was largely a remote execute operation that extracted archives on the remote machine. The extraction took about a second for a medium sized application. Again, this is a waiting situation where machine resources are essentially idle while the thread is in use, so use more threads. The machine was a dual, dual-core build machine and we set OMSUBMIT_MAX_USER_PROC to 50.

Watching the workflow monitor as the deployment ran, we could see roughly half of the machines light up (meaning actively running) at any one time and the entire deployment process synchronized all 48 machines in a little over two seconds.

So, don’t simply match your machine’s CPU threading capabilities – overclock! Aim high for max threads and try to determine where your performance is optimized. I’d love to provide you with some metrics as a function of thread count, but usually once something is working it’s on to the next project. I barely have enough time to blog!

It’s very common to have a code check-out step be part of an integration build. Far better it is to not check out code before a build. What? How is that possible?

Let me explain, Fred. The simple approach most of us take (and have to take when getting things started) has developers commit, commit, commit, and when it is time to deploy, check out the code, do a build, and then deploy the application. There is room here for both problems and optimizations. Doing a full check out of the code tree is more costly in terms of time than checking out only what has changed. Updating the code tree with a single commit is less costly than updating with a large number of commits.

You may be limited by the technology in-hand and how much you’ve invested in learning the technology and possibly customizing it. For example, if your file control tool can only do a full check out of a source tree, or that’s the only command you had time to implement in order to meet the deadline, or you don’t trust your tool to do incremental updates, then you are basically running the longest builds possible.

On the other hand, if you could update the code tree every time a developer does a commit with only the changed files, then you are ready to execute a build at any moment. This requires some deft manipulation of your file control tool, and that’s why you don’t see it more often.

You might think “continuous integration” will take care of this. Developer commits, update checked out, build is run. However, you may end up with a build, test execution and deployment that takes longer than the typical time between developer commits. You still have to do incremental updates and it only solves the problem in cases with very low developer activity.

I’d like to point out one tool that does an excellent job of post-commit code checkout, CA Software Change Manager for Distributed. CA SCM (for short) is the tool, formerly known as Harvest, from the company formerly known as Computer Associates. CA SCM is a highly scalable (1000’s of developers) file control tool with a great lifecycle process model. We at OpenMake Software still have our very first customer still using OpenMake/Meister with CA Harvest/SCM after 11 years. While we have a reseller arrangement with CA, our partnership with CA in services has extended to 14 years.

About 10 years ago, OpenMake Software developed an integration with the then, Platinum Technologies’ Harvest product, modeled after the now dead Computer Associates product, Endevor Workstation, that had an excellent post-action code tree update. (Endevor for z/OS, a.k.a CA SCM for Z/OS is still very popular and has a similar functionality called ‘output libraries’ – following all this?) Our integration had the horrific name, ‘Har-refresh’.

As product partners, we finally transitioned Har-refresh from an external add-on to CA who have turned it into a core functionality of the product, called Hrefresh (a better name.) Rather than simply a post-commit check out, HRefresh updates the code tree after any action that updates a dynamic code view. This includes, renames, deletions, commits and code promotions and demotions. We like this because CA SCM does all the work and we cherry-pick sets of up-to-date code trees to build up an application source code stack for a build. We align Meister dependency directories with HRefresh-managed file system directories for a tight SCM (software configuration management) build.

This mechanism distributes the resource load for checking out code to times when builds are not required. It’s true that often times people want to build as soon as their code is checked in (or promoted), but on average it is a very big net win reducing build times.

This is just an example of the type of sophistication that is out there to prevent pre-build code check outs and save time on your builds.

One of OpenMake Software’s product strategies is to keep things simple. Build management is one of the most complex operations in all of the IT world, and one of our key benefits is to simplify, organize and automate the build process for development, testing and production.

We’ve seen a trend among our customers to simplify their build management infrastructure by going to fewer build machines with more CPU cores. Builds in particular use relatively more CPU resources than other resources as code is interpreted and compiled in memory and then finally written to disk. By reducing the total number of machines, rack space, procurement, administration and other IT overhead costs are reduced at great cost savings per machine eliminated.

Recently, I was at one of the big chip makers where they used dual quad-core CPU Linux machines for their development and builds. They had two machines and were able to control access to allow separate areas for development, testing and release builds in keeping with best practices. Having all the horsepower of 8 CPU cores on a single machine kept them from needing more machines.

Another customer does 6000 builds per month with Meister on just two build machines.

IBM, when selling BuildForge, likes to talk about big build server farms, because their tool does remote execution on multiple machines, as does Meister. However, BuildForge does not do builds at all. It can remotely execute your existing build scripts, but there is little real value add to that. BuildForge is also famously expensive. What happens over the next few years to the high investment in multi-machine remote execution software as the number of machines declines, perhaps dramatically?

A similar argument can be for Electric Cloud’s Electric Accelerator product. It’s possible in some cases, for C/C++ builds to gain an edge by pushing a compile operation to another machine, and then bringing it back. You would only do this to gain access to additional CPU resources. In the past, you might have 8 build machines that Electric Accelerator would farm operations out to. Now, you can pull all those operations into a single machine and there is no need for that functionality. Also, you are stuck with converting your GNU makefiles into other GNU makefiles.

Meister is optimized for multi-core CPU build machines and offers multi-threaded capability to both build events and non-build workflow events. You know where your build is and there are fewer dependencies on network resources. Both BuildForge and Electric Accelerator add additional overhead to build administration to coordinate across multiple machines – a dying practice, that no organization wants to invest in. Meister is the best bet for a future with fewer build machines with more horsepower.

Finding the blog Enterprise Maven made me decide to go back to the basics, today. This blog is from 2006, but the best practices of production control ignored here go back decades. I’d like to point out that Oleg Gusakov, the author, wrote the blog in a very good spirit and seems like a nice guy. He just seems to be a bit naive about what’s been happening with software development in the enterprise.

In the first section, he assumes that the only enterprise build and deploy solution is one that is customized, while OpenMake Meister has been serving that role now for 12 years. He does correctly conclude that all the enterprises in the world should not be independently investing in the same type of build and deploy solution. It is a costly investment and this functionality should be productized. That’s exactly why we did it and why that is still one of our chief selling points.

He is right that developing a product that should be commoditized is a drain on the business. However, the converse, having a commercial product provide the functionality at a greatly reduced cost compared with one homegrown, provides a competitive advantage over those companies who don’t have such a product.

Through the middle of the article, again, I think Oleg is unaware of the heavy horse SCM products out there that provide a lot of the expected functionality. Tools like CA Harvest, Serena Dimensions and others are very complex and sophisticated n-tier products. They nevertheless do not provide build support, so by combining an enterprise file control tool with an enterprise build and workflow tool, Meister, you canvas the required functionality.

Lastly, regarding the enterprise development lifecycle, he is right it is an oversimplification. I like his phrase that he hopes to “grow the meat.” At OM Services, we have “fully grown meat” and the enterprise lifecycle documents that we develop with our customers and clients are typically 50-80 pages in length. Here is where I review the generally accepted best practices, going back to the seventies with mainframe development. (NO, distributed platforms are not somehow different in the high level process!)

  1. It all starts with production control. Developers do not have access to production due to a fundamental conflict of interest. Maintaining business continuity trumps developers’ ease of delivery to production.
  2. Since someone else puts the code into production (or operates a tool which does so), this is the basis for separation of roles and responsibilities in the enterprise software development lifecycle.
  3. To ensure integrity of the production environment, the production build must be done by a group representing the business, not development. Developers do not do production builds.
  4. Working backwards, if you want your test environment to be as close as possible to production, you lock this down and prevent access to developers. This is usually the QA testing environment.
  5. Again, to avoid a conflict of interest, the QA testers should be working for the business, not the application development team.
  6. And it follows that the build for the QA environment is done by the business.
  7. The developers job from this perspective is delivering source code to the business, which wants retain the source code and the ability to use it (meaning they can build it).
  8. The process of developers transferring source code to the QA build people was called “throwing it over the wall”. Now, the heavy horse SCM tools and Meister workflow make it easy to do this and allows variations of iterative development involving the QA environment.

Any type of continuous integration or agile development practice typically happens before the QA environment. Any develop methodology for the enterprise must take into account the fundamental conflict of interest between software change delivery and business continuity or ignore it and remain entirely in front of QA.

If you are a developer, you can think of this as a loss of privilege, or you can be elated that other people are doing the dirty work for you and you can focus on the art and science of engineering business solutions. If you are really depressed, maybe you should be on the other side of the wall!

How to Improve Your RFP Process

I’ve been involved with software procurement that involve RFP’s (Request-for-Proposal) on both sides of the fence – as part of the purchasing organization, and more frequently as a software vendor. I’ve seen the mistakes people make in sending out an RFP and then making a purchasing decision based on the results and I’m filing those items away should the time come for me to head up a software purchase myself.

RFP’s work best when you need a product that is strongly commoditized. For example, you need a software package to manage your purchasing. Or, you need software to do perform all of your HR (Human Resource) functions. HR functions are pretty much the same at most companies – sure larger companies might have needs for scalability and breadth of functionality that smaller companies don’t, but its all HR.

Before I get too far into my experience, let me mention that this is not a sour grapes article. Meister does great in sales with RFP’s and we invariably win or come in a close second. However, in some of the “close second’s” we’ve lost to a product that we would not regard as a competitor and we can see that the purchaser has not satisfied some of their stated key requirements at the beginning of the process that got us involved. It has made me wonder and this article is the result.

The first way to screw up an RFP is to make it a democracy – keep it an oligarchy of stakeholders. If you start out with a need for an HR solution and you solicit requirements from everyone in your company, you might get requirements like “needs to do purchasing” and “needs to do supply-chain management”. If you then invite more people from the purchasing department to participate and then have everyone score according to the sum-total requirements, you may very well end up with a purchasing solution when your original goal was an HR solution. In this case, there was a lack of weighting for HR requirements and HR stakeholders’ votes.

Some old fashioned leadership can work here, where the key stakeholder makes the final decision and is accountable for it, taking into account everyone’s scorecard. Sure, this is an extreme example, but it illustrates my point about requirement dilution clearly.

The second way to screw up an RFP is to limit the value you can get from a procurement. Let’s say you send out an RFP for an HR solution and one of the vendors says they can do financials as well. You could say, well, we are only looking for an HR solution (that could very well be the case, but let’s say there is no solution for financials in place). You could talk to the guy in charge of financials and let him know there is an opportunity. My first point is relevant here because really the software products are no longer commodities. Either you should open the RFP up to vendors who can do HR AND financials or make a decision based on thorough investigation of the functionality with management consensus about the overall benefit of each product to the organization. In this case, it’s more opportunity lost, but finding opportunities and bringing them forward is how people win leadership awards (or keep their jobs in a rough economy).

To summarize, have clearly defined business needs and requirements and stick to those when making your decision. If you find you are trying to choose between apples and oranges, step back and regroup with management to determine the overall value of each product to the organization. In our industry, single tools that provide functionality in SCM, development and IT infrastructure are hardly commoditized today and have small overlap with one another.

I suppose it boils down to having clear requirements, stakeholder involvement and effective leadership. But, isn’t that always the case?

Mojo and Meister 7.2.1 On The Way

We’ve been testing Mojo and Meister 7.2.1 and getting ready for their release on December 15. This is a maintenance release with bug fixes from users who’ve started running builds and workflows with Meister 7.2 and using the free workflow automation of Mojo and putting those releases through their paces. It also contains a lot of UI and documentation improvements.

Existing users of the 7.2 version of either Mojo or Meister will be able to upgrade via the update sites, http://www.openmakesoftware.com/mojo/update_site and http://www.openmakesoftware.com/meister/update_site, respectively.

Users interested in getting Mojo, the free workflow automation tool, and Meister, the industry leading build automation tool, can find download instructions on our website, http:///www.openmakesoftware.com.

With the recent financial meltdown, I couldn’t help but notice a trend among my clients. I’ve worked with over one hundred companies in one capacity or another that has given me an insight into how they develop software.

Among these companies, there were two particularly frustrating companies where I was on site and a third that I assisted with a very difficult proof-of-concept. I actually compiled software applications for each of these companies, each of them failed to implement the enterprise software process and automation improvements I was helping them with and none of these three companies exists today – victims of risky investment practices and high-profile failures in the 2008 financial crisis.

To be sure, I’ve had many frustrations at many other companies because implementing centralized software development management practices is extremely difficult, involving almost every department in IT. But the other companies ultimately gained the consensus, management backing and financial support to implement real change to lower the risk of software delivery and improve business continuity.

The three software management failures that ultimately turned business failures were particularly sore spots for me. And, as a trained physicist, when I see three software management failures and three business failures and they are the SAME three out of a hundred, well, I know there’s a very high probability for a relationship.

An anecdote: one of the three, a super large bank liked to grow by acquiring other banks. Word on the street is that the OCC, stepped in and said you have to improve your software management practices before you can acquire more banks. So, the bank implemented a software management improvement program including first centralized version control and later more proper configuration management (always surprising to me that they can deliver binaries and manage versions but not know if the two are related in any way). I was brought in for the centralized build management part. Then the OCC said something like “We see you’ve implemented some version control. OK, you can go ahead an buy more banks.” POOF! The software improvement projects were all massively scaled back and there was no more enterprise build management to work on. How about that?

When a company implements software development and delivery improvements they are lowering the risk of proprietary software changes which in turn lowers business risk by decreasing interruptions of services and ensuring on-time delivery of new features. At this stage of industry maturity, a company that does not have control over software delivery is accepting a business risk that fewer and fewer competitors accept.

So its reasonable to believe that a company that takes large financial risks will take risks across the board – even with their software management practices.

  • 1 Comment
  • Filed under: SCM Strategy
  • File Control Madness in Eclipse

    I found myself actually using four different file control tool plug-ins in a single Eclipse 3.4 workspace. This is not show-off, but for legitimate needs. Before proceeding, let me disclaim that I am reorganizing my Perl development on a new machine and I have everything somewhat haphazardly in a single workspace. Ideally I will have different workspaces for different projects, but until I build a standard set of preference, particularly for EPIC Perl templates, and, I can export and import them into different workspaces, I’m locked into a single workspace for now.

    Image

    If you are not familiar with Eclipse and version control (or as I call it generically “file control”) you have to install plug-ins that provide the functionality to interface with different tools. I have an EPIC plug-in that provides Perl tools, and I’ve installed EGIT for Git integration and plug-ins for Subversion and Bazaar. The CVS plug-in actually comes as part of the base Eclipse install, though that status is questionable given the popularity of Subversion and the rapid rise of Git.

    These plug-ins provide the capability to create a new project from the contents of the file control repository, or attach an existing Eclipse project to a new project under file control. You do this by right-clicking on the project and going to the “Team” menu and the “Share” item.Here is a quick explanation of the screen shot above. “om64Perl” comes out of our OpenMake CVS repository. The ones attached to Git, are pretty obvious with the word “Git” clearly to the right of the project name. Being a distributed repository tool, the Git repository that the projects are attached to is actually in the workspace. Then, I have an anemic open source project on SourceForge to which the “PerlSCM” project is attached via Subversion. And, finally, there is the Perl VCI project “vci” that uses Bazaar.

    There you go. Because I’m involved with three open source projects that use different file control tools, and regular work that uses another, I end up with four.

    People as Glue

    Perhaps you’ve heard of “glue” scripting, which is scripting designed to pull together various tools and processes into an integrated process automation.For example, suppose you have both ClearQuest from IBM and CA Harvest. Neither IBM nor CA have a real stake in integrating with the competitor’s tool, but you do. So what do you do? You create some nifty Perl scripts (because there are no other real scripting languages) to associate Harvest package promotion with ClearQuest record changes.

    Well, we in OM Services have become sort of a people form of the same thing. We’ve had multiple engagements with the same companies, providing a consistent level of expertise and proprietary knowledge of each company’s software processes and automation technology.

    Sometimes we even smooth out transitions from staff turnover and that’s where I think we act as “glue” in time. OK, so I have a physics background and can’t distinguish between time and space, but providing connectivity between two points is sort of what we do.

    The goals of an SCM team charged with build management is to NOT have proprietary knowledge outside the team, but well, that can be expensive and sometimes impossible even when the funds are there. So, while we strive to meet the needs of our customers through services, we think the tool should provide that bridge, keeping the proprietary knowledge in house, and we in services are constantly feeding back our input into product development to achieve that goal.