Check In Early, Check In Often

Chris_Boran · August 22, 2008, 12:00am

People should checkout Accurev (the company that Damon Poole is CTO of). They do a pretty cool source control system that I’ve been using for about 6 years. They do an awesome job of making merging and parallel development for many users on the same code base dead simple. I’ve used a lot of ClearCase and Perforce in the past, and I’ve become a big Accurev fan.

Locivars · August 22, 2008, 12:00am

The Accretion metaphor is McConnell’s idea (page 15, Code Complete, Second edition), Jeff forgot to link/mention it this time.

T_E_D29 · August 22, 2008, 12:00am

Note that a successful compile does not mean that it has to do
anything meaningful, you could implement stubs and empty interfaces
for the pieces that don’t have any useful code yet.

I suppose I could see that for new work. The only real problem I have is that when I’m working on a big new bit of code, the classes (and thus file names) are in a rather constant state of flux. Files are guaranteed to get renamed or merged, and sometimes a file with the same name might even reappear later with different contents and function. With Git that might not be too big of a deal, but that would give our current revision control system fits.

But that’s for new development. The other common issue I have with things being broken for a long time is when I’m refactoring really messy code. Typical activity is to split a humongous file into two or more smaller files, and then spend the next several days trying move the various other little bits around so that the result will compile.

Nij · August 23, 2008, 12:00am

Hi Jeff,

I’m guessing that you didn’t mention branches because YOU are in charge of the release cycle, and therefore it does not matter what you want to check in to the trunk (because you can assume that you will make time to fill-things-out before release).

Your advice can only hold for larger teams, or teams where there are multiple parallel dev streams (e.g. feature A releases half way through feature B development) if branches are in use (or DVCS).

You have said before that many developers do not understand simple VC systems, let alone Distributed ones -

@MattRidley, I suggest that you need to do your check-ins to branches, prepare a merge / intergration test environment for compliance purposes, and then check-in that to the trunk once all the tests have passed.

Nij

Mike_G · August 23, 2008, 12:00am

Look on the bright side, the development lead on an adjacent team to us has decided that the right Clearcase strategy is to have everyone develop directly on main, doing unreserved checkouts and merging manually when we’re finished.

Johnny · August 23, 2008, 12:00am

Jeff,
Thanks for another posting on version control. I enjoy your experiences with it and because you have so many folks posting back on the topic that I often learn something from that as well.

For my 2 cents on the matter, yeah I agree that check-in early and often is a sound policy. I also believe all work should be done in branches with some designated coordinator being the guy/gal who has final say so and does the merging from the branches into master. I’ve seen lots of folks get burned big time on a project, because some code cowboy on a Friday late afternoon thought some new break-the-api-but-it’s-so-cool-dude!!-feature checked it into the trunk, left for the day, and hosed everybody.

I started out with subversion about 3 years ago and liked it. It was easy to learn, it came with any current linux distro, it worked pretty fast. It was easy to make branches. The only problem I could see at the time was the merging of branches. That could be technically hairy. Hairy enough that my implementation of merges was:
a) Remove old trunk
b) Rename working branch as trunk

That works sorta if you’re the only guy on the project, but clearly it’s not the way to do branches. I eventually figured out how to determine the start and end of branch that I wanted to merge with trunk, but still it’s was awkward and it was brittle. Any histories changes in the branch involving files/directories that didn’t exist in the trunk could bork the commit. I would end up to have to merge the branch in small chunks. This clearly was not the way to do merges.

A buddy who I had introduced to git (he was new Debian) fell in love with it and used it in all his home projects got on my case to try it out which I did in January this year. Coming from subversion getting used to git was a learning curve. There are so many commands with git (~180 vs. ~40 for subversion) that I couldn’t grok it all. Finally what rescued me was realizing that I only use a small subset of the commands anyway to do work (init, clone, diff, log, status, commit, checkout, branch, stash, pull, remote, add, rm) and that I only needed to focus on the fundamentals. Once I got those down pat life was good.

Git comes with a graphical visualizer called gitk that makes it easy to see what’s going on with the branches and who branched from whom, and who merged with whom, and what the current status is, etc.

So why is git better than subversion for me:

It’s fast. Turbo fast. Ridiculous fast. Subversion isn’t noticeably slow, but git is practically instantaneous in comparison to subversion. A check in with git takes on the order of 1 millisecond. With subversion, 1 or 2 seconds if it’s a local filesystem, longer if you’re dealing with a network based repository.
The decentralization. Holy parallelization batman!!! No more stomping on each other’s toes. No more code cowboy committing crap into master branch and borking everybody. The only copy of the repository that is canonical is the coordinator’s. He/She is the person who pulls from the developers, sees what they like, if it’s good it goes into the his/her master branch. If it needs touch up, the coordinator does it (or better yet tells the offending developer to fix it or it doesn’t go in.) The developers then pull from the coordinator’s master branch and merge with their own. Then merge their local master with their development branch.

On a large project you can’t do that with a centralized repository for several reasons: bandwidth being sucked up, if the centralized repository goes down, no checkins, merges, checkouts, nothing. You can still work on your local copy, but you can’t commit the history of what you’re doing, and there is always the risk of some code cowboy committing into your branch something that could bork your build, or worse, code cowboy commits into master, you pull from master and everybody is hosed.

Torvals in his google talk video (http://www.youtube.com/watch?v=4XpnKHJAok8) claimed you should never underestimate the effect of how fast being able to commit, checkout, merge, etc. had with development. To be honest I was skeptical of the claim. But after spending 8 months with git I see his point. Damn, git is fast!!!

Jeff I urge you to give git a try. I think you’ll be surprised how good it is even if you believe that there’s not much need on a project with just a few folks, I’m telling you that git really is that good.

I wanted to give you some data points to give you an idea why decentralization is better than centralization for development purposes. Here are the data points to look at:
Microsoft starts to use scrum to help development (late 2005):
http://www.eweek.com/c/a/IT-Management/Microsoft-Lauds-Scrum-Method-for-Software-Projects/

I don’t know if it was applied to Vista or not. But here is a Microsoft developer’s perspective on why Vista was delayed and why it didn’t have the features folks wanted:
http://blogs.msdn.com/philipsu/archive/2006/06/14/631438.aspx

Unfortunately he doesn’t mention which version control software they used. Several years ago someone at Microsoft had posted the workflow for how Windows NT (2000? XP?) kernel developers worked with version control. I tried to remember the url but unfortunately I can’t and I haven’t been able to find it on google either. I do remember the version control was something Microsoft had developed for their internal use and it was definitively centralized version control. Some of the highlights were things like how developers only had limited access to the source code, commit had to go through a gauntlet of management, builds were stunning – instead of just recompiling the parts the developer had written and unit testing that part, the whole source code for the Windows kernel and the Microsoft products had to be recompiled and all tested. They had like 4 build servers and turn around time between submission and a response was ridiculously long – something like 2 weeks I think.

Here is how development on Windows 7 is being done:
http://blogs.msdn.com/e7/archive/2008/08/18/windows_5F00_7_5F00_team.aspx

Lastly as a comparison, here’s a presentation by the USB subsystem maintainer for the Linux kernel, Greg Kroah-Harmant:
http://ols.108.redhat.com/2007/Reprints/kroah-hartman-Reprint.pdf

The pdf paper includes all sorts of stats on the development process, but the thing that gets to me as a Linux user is how much code gets checked in/modified/dropped, etc. and it still works with releases every 3 months or so. They’ve got something like 1100 developers and the only way they manage to keep the pace they’re going at is a) decentralization b) pyramid structure of trusted developers who do the pulling and merging.

LeptoS · August 24, 2008, 12:00am

Shouldn’t check-in be an automatic text/code editor function in an IDE?

Writing code that always compiles when you hit save shouldn’t be that difficult for professionals.

Mike_Brown · August 25, 2008, 12:00am

Jeff, your rss feed hasn’t been updated since July 29th. Don’t know why, it just hasn’t.

Patrick · August 26, 2008, 12:00am

A properly configured SCM environment allows for users to have personal branches so CI early and often is a very good thing. When you want to have your code mix it up with other developers merge it… There are very good tools and methods that support merging. Merging is your Friend!

Erik13 · August 28, 2008, 12:00am

I think many people use continuous integration in the wrong direction and that is where is faulty checkin often idea is born.

Imaging I’m working on something big, e.g. a new file system.
Continuous integration doesn’t mean I add my changes to the build every day (so I disturb everybody’s work for months ).
Continuous integration means I have to download all changes make by others into my private build on a daily basis and I have to build the whole shebang myself to make sure my new module integrates perfectly.
Then, someday, when my module is mature, I can checkin without breaking everything.

I think many people get it wrong because they have (only)
one big build environment where the whole build is compiled,
which is wrong. Truth is, every project needs its own big build
environment where they can build the big thing (fast enough).

Andy_Dent · August 29, 2008, 12:00am

I’ve mainly worked with cvs and then svn but have satisfied my urge to checkin as I think, in tiny increments, by having a SEPARATE store.

For years I used VOODOO Personal on the Mac, now that it has failed to make the jump to OS/X I am using Mercurial. That allows me to have a personal repository which can be unbuildable and still checkin to the client repository when I have a contribution which is sufficiently complete to fit in with the rest of the team. I have suffered working on a site which had a philosophy of allowing the trunk to be broken (strong personalities) and I would never wish that on anyone. Using multiple repository technologies satisfies things all around.

Sod · September 1, 2008, 12:00am

There is some excellent albeit expensive commercial software available for automating and streamlining check in and other team based activities.

http://www.dragonlasers.com
Sod

yobel · September 2, 2008, 12:00am

I make parfait every once in a while when I have friends over for dinner - it’s such a handy dessert, easy to prepare and you can just have it waiting for you in the freezer and don’t have to fiddle around with it while your guest are here!

I most frequently flavour it with grated almonds, vanilla and a little bit of rum and serve it with some kind of fruit pure, but I am always grateful for new variations, so thanks a lot for this, will try it out for sure! I particularly like the combination with the red wine reduction.

http://meywal.byethost13.com

T114 · September 8, 2008, 12:00am

Nicolas,

You can only merge ranges in SVN if they’re sequential. If someone checks in between two of your commits, you have to merge each individually.

Hence holding back until I’m reasonably sure that the sum total of my changes constitutes the entirety of the feature or fix. So that when I have to merge it later, it’s just one merge instead of, say, 17.

Swaroop · September 9, 2008, 12:00am

I think the best thing is to use something like git-svn, you get git for your local system, and svn to share code with others.

Coder_Blues · September 23, 2008, 12:00am

I absolutely agree that developers should check-in often. It is crucial to integration. I think Erik’s post about continuous integration going in the wrong direction is an interesting point and worth more consideration. In addition, I haven’t used git, but svn has worked really well for the teams I’ve been on.

MikeB · September 24, 2008, 12:00am

One voice here in favor of infrequent check ins to the trunk, and major use of private branches.

If you use CVS or other unsophisticated version control, you check in often because the pain of branch/merge is larger. But there is a huge penalty for this: no large comprehensive refactoring is ever really possible. Something that requires someone to rip into the whole design, reorganize and re-gell the whole thing is feasible unless you can branch, do that work, merge, fail because of unexpected effects, fix up, merge again, etc. Good example of unexpected results would be that all tests pass, but backward compatability with prior releases isn’t perfect. (Generally, you’ve discovered that there are missing tests…, gee does that ever happen?)

To avoid disrupting everyone else, the penalty for someone merging in faulty code must be fairly low. I.e., the trunk is broken because I tried to merge in my new refactored stuff, which is tested, but something unexpected is going on… gotta go develop some more tests. In the mean time the rest of the team can simply branch around me, or it’s painless to back out the check-in on the trunk.

These kinds of big refactorings are not the everyday stuff of software design, more like monthly stuff, but it’s like unit testing, if you make it even slightly difficult, it will just never get done, and if you introduce fear of doing it, by way of rules like never ever break the trunk, you will simply never ever make major changes to a software design.

98sj · July 27, 2009, 12:00am

There is some excellent albeit expensive commercial software available for automating and streamlining check in and other team based activities.

http://jaber.mysinablog.com/
http://21jaber.blogspot.com/

CharlieM · February 6, 2010, 12:00am

Yup - I’m working on a project just now where one of the developers had source code checked out for more than a week and then, instead of checking in, pulled an old copy down from the server and that was all the work gone!!! Which meant we missed a test cycle and had to wait for the next one. This was also on a laptop, so if they had lost that then, again, it would all be gone. You can never check code in too often, in my opinion.

I go on the approach of Always check in the code, even if it doesn’t work or isn’t finished, you’ll still be able to change it later!. Then there’s no chance of loosing the code. When it’s all done and everyone’s happy with their piece of code and each piece has passed unit testing, then we create a version and release it for business test. Then you only every roll-back to working versions - by working versions, I mean versions that actually compile and can be installed - obviously they may have bugs, but they’ll be fixed the next time we do a version.

JohnF · February 6, 2010, 12:00am

@BugFree: Unfortunately it seems the ‘bleeding obvious’ is not always so obvious to some people.

@Paul Souders: The philosophy in the UNIX world may well be small modules loosely coupled, but a pearl is small modules extremely tightly bound up. A string of pearls, though…

A pearl is also pretty much an oyster’s allergic reaction to a piece of grit, so your analogy’s mileage may vary