The Agile Architect
Inhibiting Agility with Code Branching
We have the best software development tools in history. Why are our developers so afraid to refactor? Our Agile Architect explores how powerful code management tools can lead to powerful problems that inhibit agility -- and what you can do about them.
- By Mark J. Balbes, Ph.D.
- March 14, 2017
In the beginning, a developer worked alone. And it was good.
The software grew and prospered. As the flood of apps flowed, developers had to band together for safety and security, building yet larger apps. And chaos descended.
This is their story....
Agile technical best practices recommend ruthlessly refactoring code to winnow out bad designs in favor of better ones. This means that developers need to be able to fearlessly change large portions of the code base while not breaking the work of others. This is typically done by using various mechanisms to isolate everyone's work until it is ready to be shared whole cloth. But the very act of isolating work leads to behaviors that resist refactoring, the very thing they were designed to foster.
How did we get here?
The Case For Source Code Branching
Whenever a group of people are working together to build something bigger and better than any one contributor alone can do in a reasonable time, it becomes important to have rules of engagement.
With software, once projects got larger than any one developer could implement on their own, we had to figure out ways to share our code without stomping on one another's work.
Initially, this usually took the form of creating an up-front detailed design, divvying up the different pieces, and having everyone retire to their individual computers to build their separate pieces. Once the work was done, some poor wretch had the thankless task of integrating all of the pieces together. While refactoring did not exist at this time, the individual developers were masters of their own code and could modify it to their heart's content.
Fast forward a bit, projects get bigger and live longer, computers get better, and IT departments are trying to figure out better ways to share code among developers.
Some companies create a shared drive where all code is stored. Make a change to the code and everyone on the team gets the change. Break something and everyone suffers. Changing existing code (nascent refactoring) is discouraged as too risky. Better to write more, different code. But more code means more bugs, higher maintenance costs and inconsistent behavior in the app.
I worked at a medical software company that had an ingenious home-grown solution. They created their own exclusive locking system where files were stored on a remote server. All developer workstations had symbolic links to the files. Thus we could all build and run the software locally. To change a file, we ran a script that broke the symbolic link and made a local copy. If someone else already had a local copy, the script would fail gracefully. To share our changes, we ran another script that copied our file to the server and recreated our local symbolic link. Thus, we never had a code conflict because a file could be changed by only one person at a time.
This exclusive locking philosophy was implemented in many commercial tools like Visual Source Safe. The problem with exclusive locking was that you could get into deadlock situations. If Bob had a lock on File 1 and June had a lock on File 2, but each needed the other's file to proceed, then they were both stuck. They had to negotiate with each other to see who would release their lock and potentially stall or lose their work.
Configuration management tools like CVS introduced the concept of non-exclusive checkouts. Any developer could check out the entire code base (or a portion of it) to a local computer. They could change any file locally. When the code was sufficiently modified, they could check in all of their changes. If any file had changed on the CVS server, the entire check in would fail. This was a good thing. The developer could pull the changes from the server to the local machine, fix any problems and then try to check in again. And since CVS knew they had pulled the latest changes from the server, it would now accept the changes.
Developers learned to pull changes often from the server so their local copy didn't get too out of date with work being done in parallel. They also learned to check in their changes often for the same reason. This process was possible because CVS also had a powerful merge capability. If I had modified my local copy and I pulled the latest version of the file from CVS, it didn't overwrite my changes. It merged the changes from the server into my changed file. Occasionally this would lead to merge conflicts, where a change someone else had made would conflict with a local change I had made. Then I would have to deconflict the code, quite possibly by talking with that developer to understand why they made the changes they did.
And this is where the pushback to refactoring starts. Any changes you make to code, especially if they are widespread, will potentially lead to merge conflicts, which can suck up a team's time and introduce bugs while providing no new user value. Fixing merge conflicts is a necessary evil that no one likes to do. And if you don't like to do something, you tend to adopt behaviors that minimize it. So, the less you refactor code, the fewer potential merge conflicts you introduce.
But it gets worse. CVS and similar tools also introduced the concept of code branches. A code branch is a separate version of the code that is being tracked by the tool. Kind of like an alternate reality. Different developers working on different features could work on different code branches, checking their changes in often to their code branch but not affecting other code branches.
This was great if you were developing a tricky feature and wanted to build the entire thing before sharing it with the rest of the team. (It can also be used for code promotion purposes, for example, promoting from an in-development state to a QA Testing state to an In Production state, but that's outside the scope of this article.) The benefit of isolation allows for rapid development, but it comes at the cost of not being able to quickly share your code changes with the rest of your team who are working on their own code branches. By delaying sharing the code, this leads to even more merge conflicts, again reinforcing the behavior to minimize code refactoring. Despite code branches being convenient to use, they took time to create and took up precious disk space. This provided some pushback on rampant branch creation that kept branching to a reasonable level.
Fast forward to today where we have tools like Git that make it almost instantaneous to create code branches with little impact on disk space. You can create branches off of the main version of the code, often called Master, or you can branch off of other branches. Then you can merge your branch back into Master or into other branches. You can even cherry pick some of your modified code to merge into another branch. And the more you use these powerful tools, the more you have problems with merge conflicts as two, three, four or more branches are running in parallel, changing code in different ways. I have seen teams, more often than I care to admit, throw away a feature that is completed on a branch because merging it back to master is harder than rewriting the feature based on the latest code in master.
Let's look at an example.
Suppose your team has an existing application with a sophisticated UI. As part of building the UI, the team follows a standard Model-View-Controller (MVC) design. But up until now, they've done it by convention. Every view has a custom written MVC.
As you and your pair programmer write yet another MVC implementation, you realize that life would be easier if there were some base classes to perform the common MVC tasks. This would also ensure uniform behavior across screens rather than depending on each custom implementation to coincidentally do the same thing.
So you turn around with the team, discuss the benefits of it and everyone thinks it's a good idea.
Then you start to talk about how you are going to do it. Of course, refactoring out the base classes is straight-forward if tedious. Start with one MVC, refactor it to use the base classes, run the automated tests and fix anything that's broken. Do the same for each of the four other screens that the team has already built.
But that's only in one branch of the code. What about all the others? How do you propagate the changes?
So you start discussing strategies to propagate the changes. Someone suggests making the changes on your branch, merging to Master, then merging down to the four other branches in flight. But you can't do that until you finish with your feature or all of your unfinished code will pollute the other branches. Exactly what you were trying to avoid by creating branches in the first place.
Someone suggests creating a new branch off of the current Master, make the changes, merge back into Master, then merge Master into the five ongoing feature branches, dealing with the inevitable merge conflicts that will arise in each of the five branches. And, of course, since the four screens are in all five branches, that's 20 merges. But for a given screen across each branch, you want to make sure the merged code looks the same so there aren't merge conflicts as all the branches eventually merge the same but independently modified code back to Master. That strategy seems reasonable. Now who's going to do it? Everyone stares blankly.
Perhaps, one of your colleagues says with slight desperation, now is not the right time to introduce this large refactoring. (Large? Really? It's only large because of all of the code branches.) Perhaps the team should wait until things are less hectic and there is less chance of introducing bugs through merge conflicts. Perhaps we should get the next release out the door first, and then worry about this refactoring.
So you turn back around to your computer and implement yet another custom MVC solution.
Final Thoughts
If branching is bad because it inhibits ruthless refactoring and therefore code agility, then perhaps we shouldn't have any branches. But this takes us full circle back to all the original problems we had that led to the creation of these tools in the firsts place. Or does it? We'll talk about that in my next column.