In-Depth

XP lessons learned

This article is excerpted from Chapter 31 of ''Extreme Programming Perspectives'' by Michele Marchesi, Giancarlo Succi, Don Wells and Laurie Williams. Used with the permission of the authors and Addison-Wesley.

This article is based on our experience while working for a software outsourcing company that was built around the Extreme Programming (XP) methodology. One difficulty we had in adopting XP was that the original ''white book,'' (see Reference 1) though inspiring and persuasive, was somewhat short on detail. We feel that it is valuable to the XP community if groups doing real projects provided detailed accounts of their experiences in applying XP. Finally, some issues took on additional significance for us because our client had outsourced their project to us.

Research hypotheses
What facets of XP worked well and were easily accepted by the customer (including some practices that we expected that the customer wouldn't like)? What practices proved their worth when requirements or conditions changed? What practices were difficult to implement? What modifications did we make to some of the XP practices and why?

Obviously, we don't have definitive answers to these questions that can be extrapolated across all projects, but our account adds one more data point to the sample.

Description of the context of the experience
Our company was formed around XP, which meant that all our developers and our management were committed to XP from the conception of the company. The developers on this project had worked together on other XP projects for a few months, and some had long relationships during their careers, so we had a team that was ready to hit the ground running.

Our client for this project was also a start-up, which meant that they were more open to XP than a larger company with an existing methodology might have been. Many of our customers had backgrounds in development, so they were familiar with the shortcomings of conventional methodologies. They were also at a stage in their development where it was possible for them to remain on-site through most of the life of the project.

The XP-standard environment prescribed by Kent Beck proved its worth in this project (see Reference 1). We often had four pairs (including client developers) engaged on the four workstations in the corners of our development room. And much of the communication during the project happened indirectly as people overheard and responded to what an adjacent pair was discussing. The walls covered with whiteboard wallpaper also worked out well; they always held design scratchings and project notes. Even when we had online data such as bug reports, we put highlights up on the wall because they acted as a physical reminder of our priorities. Our wireless network and dockable laptops worked well, too -- they made it much easier for us to switch pairs and get together in another room for an impromptu design session.

We decided to allocate a specific and separate space for customers outside the development room. This helped them remain productive with their ''real'' jobs while still remaining accessible to the development team. It also meant that they didn't get in the way of the developers. Although we tried to be as open as possible, we also felt it necessary on occasion to meet in a separate room without customers present. Likewise, we made space available for the customers so that they could meet among themselves when they felt the need.

Results from the experience
Billing and Contracts: Based on a recommendation from ''Uncle Bob'' Martin, we decided to bill by the iteration instead of by the hour to prevent any squabbling over hours worked. This also eased the time-tracking burden on the developers and made it easier to swap developers when needed. More subtly, we felt that this helped show the client that we were sharing the risk with them and that we were invested in the project.

We found later that this also acted as a feedback mechanism -- overtime is an effective cure for developer overreaching. We developers had no one to blame but ourselves if our optimism resulted in long hours. This can be taken only so far -- we still believe that we must go back to the customer and reduce scope when we find that the estimates are way off the mark.

The contract we agreed on with our client didn't attempt to nail down the deliverables for the seven-week life of the project; that isn't possible when you're going through iteration planning and changing direction every two weeks. We agreed on the size of the team and that we would augment the contract every two weeks with a list of stories for the next iteration.

Iteration planning: We anticipated problems in convincing the customers to narrow iteration scope to a set of specific stories that made sense for both business and development. However, this negotiation actually went quite well -- our customers were willing to listen to our reasons for wanting to do some stories ahead of others (though we tried to keep such dependencies to a minimum) and for giving high-risk stories a large estimate.

The customers loved the quick turnaround of the planning game. They were afraid that it would take us much longer to become productive than it did and were pleasantly surprised that we were able to identify the important stories as quickly as we did.

Team size: Our team size for this project was four developers and one tester, whereas we had identified an ideal team size of eight developers and one tester and usually had at least six developers on previous projects. We found that there were some advantages and disadvantages to this smaller team size. On balance, we'd probably still prefer to have six to eight developers in a team.

Pros:
* We have struggled with whether everybody who is part of the development team should be in the planning game -- progress sometimes bogs down with too many participants, but a lot of knowledge transfer does take place. This was not a problem here with the smaller team size.
* Communication and the coordination of schedules were easier.

Cons:
* It was a little more difficult to recover when a developer had to leave for personal reasons.
* The overhead of tracking the project was more visible as a significant component of the total time spent on the project.
* We had a smaller spectrum of skills available among the developers.
* With a smaller number of stories in each iteration, we sometimes found that one of the assigned tasks was a bottleneck because other tasks depended on it.

Stand-up meetings: How do you avoid discussing design during stand-ups? We had repeated problems in limiting the length of our daily stand-ups; the issues raised were often valid and needed further amplification.

One device that worked was to write on a whiteboard all the issues that people raised for discussion during the stand-up. Then, after the stand-up was completed, those who needed to be involved in each discussion could split into smaller groups and follow through on the ''promise to have a conversation.''

Another minor innovation was the use of an egg timer to limit stand-ups. We set the egg timer to 10 or 15 minutes at the start of the meeting. Then the person who was talking at any time during the stand-up had to hold the timer in their hand. This acted as a reminder to the speaker to be brief and a reminder to others in the stand-up to avoid any private conversations while someone else had the floor.

We found that the client wanted a much more detailed status than we had planned to supply -- they were used to the traditional spreadsheet view of project tasks with a percent complete for each task. We compromised with a daily status message that summarized the state of the project and the outstanding issues -- most of the work to compile this daily message was done by one person at each stand-up meeting.

Pairing with client developers: We knew from day one that we would need to hand off our code to the customers in-house. For much of the project, their technical lead was on-site and worked with us. Four other, newly hired developers paired with us for different stretches of the last three weeks.

The objective of accelerating knowledge transfer by means of pairing worked very well. It also helped that XP is a cool new methodology that many developers are eager to experience. But that initial enthusiasm was probably sustained only because these developers were able to make constructive contributions. One developer didn't like the idea of pairing at all and quit when he found out that he would have to pair at least until the code was handed off to the client company.

Our experience with the technical lead was more complex. He was technically very capable and made several suggestions and observations that led us down unexplored paths. However, pairing with him was hampered by the fact that he was playing multiple roles, trying to get the most value for the client's investment while still acting as a developer. Therefore, he tried to steer us toward solving the difficult technical problems that he thought would crop up later, instead of focusing on the stories and tasks immediately at hand.

Finally, at the end of the sixth week, the team captain (our instantiation of the ''coach'' role) and another developer both had to leave the team unexpectedly. We introduced two new developers to the team and were able to deliver all agreed-on functionality on schedule, which further validated the worth of pairing and shared code ownership.

Story estimates
During the first iteration, we felt the natural pressure to please the customer and bit off more than we could chew. We found that our attention to tracking and to promptly creating the acceptance tests as well as our discipline in sticking to our XP practices all suffered when we were under the gun. We continued to practice test-first programming but neglected to pair up when we thought that the tasks being tackled were simple enough that they didn't need the spotlight of a continuous code review.

As our customers came to trust us more in later iterations, we felt less pressure to prove that we were delivering value by stretching ourselves to the point that our discipline degenerated. We also learned from our failure to meet the optimistic velocity of the first iteration: We reduced our velocity by about 20% from the first to the fourth and last iteration and felt that the quality of our code improved as a result.

Bug fixes: By the third iteration, a substantial fraction of our time was spent on resolving bugs from previous iterations. We found that we had to take this into account when estimating our velocity. Part of the problem was that our client was new to the idea that we could throw away work when we realized, after it had been completed, that it should be done differently. They saw these issues as bugs -- we saw them as new stories in the making.

The ''green book'' (see Reference 2) suggests that significant bugs should become stories in future iterations. We probably should have tried harder to convince our client that this was the right course to follow -- there's a natural tendency for the client to feel that bug fixes are ''owed'' to them.

One approach to bug fixes that worked quite well was to have one pair at the start of each new iteration working on cleaning up significant bugs -- those the customer had decided definitely needed immediate attention. At times we had significant dependencies on one or two of the tasks in the new iteration. Especially in that situation, we found that it was an efficient use of our developer resources to have one pair working on bug fixes while these foundational tasks were tackled by others.

Overall, we were not satisfied with our handling of bug fixes during this project -- we wanted to convert them into stories, but our customers always felt that they were ''owed'' bug fixes as part of the previous iteration, above and beyond our work on new stories.

Acceptance testing: One thing we realized was the importance of a full-time tester to keep the developers honest. When we did not have a full-time tester for the first iteration, we got 90% of every story done, which made the client very unhappy during acceptance tests -- they perceived that everything was broken. We also found that our tester provided an impartial source of feedback to the developers on their progress.

We made one modification to our process specifically to facilitate testing. Our tester felt burdened by having to ask the developers to interrupt their paired development tasks whenever she needed a significant chunk of their time. So we decided to assign the role of test support to a specific developer for each week of the project.

Because we had multiple customer representatives, we found that a couple of times one customer helped create the acceptance tests, but a different one went through them with our tester at the end of the iteration. This necessitated many delays for explanation during the acceptance testing and some confusion over whether acceptance tests had been correctly specified. We concluded that in the future we would strongly push for one customer to help create and approve acceptance tests.

Unit testing: Our unit tests proved invaluable in ensuring the quality of our code -- we found on numerous occasions that refactorings in one area of the code caused side-effects elsewhere that we caught with our unit tests. Because we relied so heavily on the unit tests, the frequency and time to run them increased to the point that they took almost two minutes. Our first response was to do some refactoring to reduce this time. We then made use of the flexibility of Apache Ant's XML configuration to sometimes run only a specified subset of all the unit tests.

During one iteration, we implemented a story that required a multithreaded producer-consumer engine that was difficult to test using JUnit. We created pluggable stubs for each module of the engine so we could test any one module while simulating the functionality of the other modules.

Metrics: As a means of encouraging the creation of unit tests, we wrote a simple script that traversed our source tree daily and sent e-mail with details of unit tests written, organized by package and class.

We also used JavaNCSS (distributed under the GNU GPL), which generates global, class, and function-level metrics for the quality of code. We automatically generated these metrics daily and wrote the results to the project wiki to help us determine what parts of the code were ripe (smelly?) for refactoring and whether test coverage was adequate.

In addition to these automatically generated metrics, our tester manually created a graph of the acceptances tests, showing acceptance tests written, run, passed, and failed. This information was available on the development room whiteboard with the tasks and status of the project. A snapshot of the current state of the project was thus available on the whiteboard, while a more detailed view of the project could be found on our project wiki.

The grade card: After each iteration, we graded ourselves on the main XP practices and a few other aspects of the process that we felt were important (tester-to-developer communication, clarity of the stories, accuracy of estimates). The scores showed us the areas where the development team needed to focus, and provided some useful and immediate feedback into the process. They served as a check against our sacrificing the long-term benefits of sticking with the process for the short-term benefits of churning out more code. We found that our scores improved substantially with each iteration, with the lowest grade in the final iteration being a B-. We made these grade cards available publicly on the wiki, although we did not invite the customer into the grading process. We will consider that step for future projects, at least after a couple of iterations, when some trust has developed between developers and clients.

Object-oriented databases: We were fortunate to be able to use an object-oriented database management system (OODBMS), rather than a traditional relational database management system (RDBMS), which enabled us to treat the domain model as identical to the persistence model and therefore to be agile when refactoring the code. It's much more difficult to refactor the model when the data representation cannot be changed at will.

Delivery day
One aspect of XP that we had to rethink in our circumstances was the amount of documentation that was necessary. Because the client developers would be responsible for maintenance and enhancements, we needed more documentation than for an in-house project. So we put together an overview of the design along with some automatically generated UML diagrams and made sure that all our classes had Javadoc comments. We also added some installation and release documents and a developer FAQ. For most outsourcing situations, this level of documentation is probably necessary.

The final iteration was completed on Wednesday of the seventh week, with the contract concluded on Friday. As delivery day approached, we noticed how different this was compared with past experiences at other companies. The developers all left before sunset on the day before D-day, and the atmosphere during the last couple of days was relaxed and cordial, even celebratory. For our final handoff to the client, we followed Alistair Cockburn's recommendation to videotape a design discussion. We pushed all our code and documentation over to their CVS repository, burned a CD containing the code and documentation, and celebrated with beer and foosball.

All in all, we were pleasantly surprised with the way our customers (both the developers and the business people) embraced XP during this project. They had some knowledge of XP when we started and were eager to learn more. The CTO had previously worked on a similar application and had some definite ideas on how to handle the complexities of the project, but was still receptive to an incremental and iterative approach. We found that XP worked well when dealing with our technically sophisticated customers. Rather than going around in circles when we disagreed, we could prove (or disprove) our design ideas using the concrete feedback of our code.

What to do next
In retrospect, though a couple of our customers did read the white book, we felt that it would have been useful if we had created a brief ''XP owner's manual,'' perhaps a few pages long. Such a document would include some items intended to educate the customer, such as the following:

* Planning game -- story creation, deferring in-depth discussion of each story until it is selected or needs clarification to be estimated.

* Acceptance tests -- expected customer input; what happens at the end of an iteration.

We would use other items as a checklist for discussion, such as the following:

* Bug fixes -- prioritizing bugs, possibly assigning them as stories.

* Documentation and status -- determining how much documentation and reporting is essential.

We also found that many aspects of our process worked better as we developed more trust in our relationship with our customers. With future projects, we would like to focus on whether projects with repeat customers do indeed bear out this observation.

References
1. Beck, Kent. ''Extreme Programming Explained.'' Reading, Mass.: Addison-Wesley, 2000.
2. Beck, Kent and Martin Fowler. ''Planning Extreme Programming.'' Boston: Addison-Wesley, 2001.

Acknowledgments
We thank Kevin Blankenship and John Sims for making the Tensegrent experience possible.


Natraj Kini is a founder of Agile Development, a Denver-based software engineering firm specializing in the XP methodology. He can be contacted via e-mail at [email protected]. Steve Collins is a founder of and senior architect for Agile Development. He can be contacted at [email protected].