Check In Early, Check In Often

James · August 21, 2008, 12:00am

I blame this (also) on piece-of-shittake source control systems (like Clearcase). Guess what, if commiting is a pain-in-the-behind, people will put it off as much as they can.

T_E_D26 · August 21, 2008, 12:00am

I think I can even improve on that, though. You could use a batch
file (or shell script) to compress the files and copy them into
that directory!

Well, yeah. If its that much work, its probably easier to just check your stuff into a revision control system (that is already being backed up).

However, here the IT folks periodically back up our user profiles. So as long as I work under there I’m getting backed up. What’s better, I don’t have to remember to do it manually.

Ditto with the file versions from Emacs. I don’t have to stop and think which versions to save. I don’t have to run a special VC command, I don’t have to do anything. Thus nothing to mess up.

Plus at the end of it all I don’t have a jillion junk versions of files being saved in our production revision control system forever like they are important pieces of code.

MarkK · August 21, 2008, 12:00am

Another post reminiscent of Code Complete; BTW, that’s a good thing!

jldugger · August 21, 2008, 12:00am

What DVCS does is make explicit important concepts, like patch review and branching. Branching and merging can be done with CVS and SVN. The trick is, SVN fixed the wrong half of CVS. CVS required a long time to branch or merge; what SVN advertised is really fast branching.

But merging is where the pain is. You want to be able to correctly resolve as many conflicts automatically as possible. This is something that DVCS has to think about and do, because it’s the intended use case. Whereas many groups refuse to use branching on SVN or CVS because merge is too painful.

Matt_Ridley · August 21, 2008, 12:00am

I hear ya Jeff, but unfortunately where I work, if its checked in, its supposed be through the SOX controls and ready for the next release.

Crazy system I know.

thomas2 · August 21, 2008, 12:00am

At my previous work I used to just ‘check in at the end of the day’ and considered the CVS repository more of a backup system than a version control system. I didn’t have to work on code together with someone else a lot, so it wasn’t much of a problem.
However when we changed to svn I read the svn book front to back and got more insight into version control. And coincidently at college I had to use svn for a project shortly after that and we got a class about it. Because I had to work together with other people on the same code I quite quickly picked up good practices and checked in after every feature, bug fix, or whatever small change.
Now at my new work even though I work alone on the code most of the time I still employ this work flow and works really well. Every little fix or change is a revision so it makes restoring and reviewing easier too.

OleE · August 21, 2008, 12:00am

Check in early and often is all very exciting if you’re working on new code or working alone, but if you are working on shared code as part of a team, checking in bad stuff will just hurt everyone. I agree hoarding a bunch of changes is bad too, but you have to do enough unit testing to be sure you won’t pollute the pond before checking in code if it will affect your colleagues. The balance between getting something working independently first and sharing the burden of getting it working is a tough one.

It helps to design systems such that biggish chunks are independent from one another to minimize the crosstalk - good design anyway - but there will always be core code that’s shared, and where you have to be careful before checking in changes…

J__Peterson3 · August 21, 2008, 12:00am

Ever heard of tools like Perforce? You work in a branch, and integrate from the main trunk into yours, so your code is always up to date with main. Then when you’re finally ready, you reverse integrate into the main trunk.

Checkin as often as you like, you won’t break the main build. Integrating with main doesn’t affect the rest the team. And the tools do most of the merging for you, except when there are actual conflicts. Works great.

Jeremy · August 21, 2008, 12:00am

Branching, shelvesets, checking in often, Get Latest often…

… none of it matters. No, truly.

What’s important is communication.

The problems with source control happen when developers don’t communicate between each other, as peers and professionals, about what they’re doing.

Whether the checkins happen hourly, daily or weekly is irrelevant. What’s important is that developers talk to each other.

Tuoski · August 21, 2008, 12:00am

I don’t usually check in stuff that is not completed.
I just shelf it, which meand that it stores it in version control but doesn’t include it in the version.
Atleast Team Foundation has this feature, and I’m pretty sure that other version controls also have it.

RiteshR · August 21, 2008, 12:00am

Jeff, I couldn’t agree with you more. I also agree with most points in Anders Sandvig post. Most developers tend to treat source control only like a backup system, which is certainly not the real purpose of it.

Ritesh.

TechRockGuy · August 21, 2008, 12:00am

Early and often can be subjective. Does often mean 5 or 50 times a day. Codes check in frequency is subjective and should be base on case by case scenario. In a team, if one developer is working on something that the other team members depend on, then the codes should be commit once is ready for use. If there isn’t any dependency, then the frequency can be cut down to twice a day.

Imagine, working in a team of 10 developers, and everyone checkin the codes hourly, does that also mean I have to update my codes every hours too? Then probably most of the time will be spend on merging and integrate the codes. This can also break the momentun of actual development work.

Lastly, a big project that takes 5-10 min to build, will result in most developers waiting for the build to complete and ensure everything is working before they want to integrate and commit their codes. So will everyone be contenting to commit their codes upon a sucessful build?

http://techrockguy.blogspot.com/2008/05/source-control-with-continuous.html

Bill74 · August 21, 2008, 12:00am

One of the concepts that I really like to try for is, if you can’t check in a working, tested system at night, roll everything back and start fresh the next morning.

It seems frightening, but here are the possibilities:

You were nearly done but just couldn’t get it to work.
–The next day you will have all that work done in 2 hours, and it will be much better code. Chances are you were confused by the end of the day anyway, and without doubt that showed in your code.
You were no where near done.
–well obviously you took too big a chunk, break your task apart better the next day and take on a smaller chunk. You will understand it better and be able to implement it more quickly.

The alternative: At the end of every subsequent day, you end up saying Damn, now I REALLY can’t afford to throw away all this work, and raise the chances of having a difficult merge.

Don’t underestimate the power of rewriting your code either. Take the chance every opportunity you get! This is how you learn, grow and also saves HUGE amounts of time the next time someone has to touch the code (which, more often than not, will be before your current deliverable).

Coding quickly, fewer keystrokes, expressiveness, elegence etc… are all useless crap.

The only things that matter are that you have readable, understandable and DRY code, and this usually only comes through spending some time with your code and rewriting it a couple times.

T_E_D27 · August 21, 2008, 12:00am

Ever heard of tools like Perforce? You work in a branch, and
integrate from the main trunk into yours, so your code is always up to date with main. Then when you’re finally ready, you reverse
integrate into the main trunk.

What he’s talking about is making sure your changes don’t suddenly appear to everyone else fully formed, like Athena jumping out of Zeus’ head grown up and in full armor.

For you Perforce, Bitkeeper, and Git users, the way to translate this post is that he is suggesting you make sure to do that reverse integration into the main trunk every day.

John_Grimes · August 21, 2008, 12:00am

Your basic premise is hard to argue. However, in reality checking in as often as you advocate just isn’t a reality for the software projects that I’ve worked on. Sure, you can check in code that compiles, but if it doesn’t work, it could prevent someone else from testing their feature. Checking in in the way you advocate requires a lot more communication as team size grows, which in my opinion isn’t a great use of time. Plus, IDEs such as intelij have a local repository that is automatically built, so you do no need to check in until your feature is complete. I’m a fan of your blog, and I agree with most of your posts, but I just can’t get behind you on this one.

JakubN · August 21, 2008, 12:00am

I agree that painless merging is absolutely a strength of DVCS but this is also possible via branching and merging in centralized systems.

@Jeff: Yes, but IMHO DVCS make it easier, especially those which include some kind of history rewriting tools, like amending last commit, or rebasing (in Git), transplanting (in Mercurial) or grafting (in Bazaar) branch to a new base.

And with centralized VCS you have to allow creation (and deletion, and renaming) of branches, and develop some convention (or some tools) to avoid conflict in naming branches, e.g. branch namespaces like login/branchname or initials/branchname or login@branchname. All of this assuming that centralized VCS has good support for merges; all distributed VCS have it because they have to, while Subversion up to version 1.5 (not mentioning CVS) had very poor support for easy merging branches.

But I agree completely with the software accretion model. If beside check in early, check in often (or commit early, commit often) you also follow one feature per commit, then when there would be bugs in the code it would be easy to find them by history bisecting aka diff debugging. See:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
http://kerneltrap.org/node/11753
http://boinkor.net/archives/2006/11/using_git_bisect_to_locate_bug_1.html

TrippL · August 21, 2008, 12:00am

@J.Peterson:

You beat me to it! I learned that pattern when I was a heavy Perforce user. I branched every time I was heading into unfamiliar territory, even if only for a simple change. I’ve long done this as a solo developer, and when client needs would force me to change a parent or sibling branch before I was finished with my digression, I’d merge back (up and/or) down to bring my exploratory branch up to date.

To me, this working style is essential to any XP-ish development style, since it’s a powerful safety net, allowing complete freedom to refactor mercilessly. I’ve never worried that a series of refactorings would fundamentally break anything, or that I’d work up a series of refactorings, only to end up stuck, somehow, and lose the entire series because I wasn’t be able to roll back.

I’m a Subversion user now (not because I prefer it over Perforce, but, you know, because.) With svnmerge (http://www.orcaware.com/svn/wiki/Svnmerge.py), it still isn’t p4 integrate, but it’s good enough to keep me using the patterns.

@T.E.D.:

However, there are other ways to do those things. A good editor (eg:
emacs) will keep old versions of files around for you. I have mine
set up to keep the last 10 versions. VMS used to do that for you too.

Working out of a directory that is periodicly backed-up is all you
really need to do for backups. If you can’t do that, then you’ll
have to periodically copy your work into a place that is backed up
just to be safe. Leveraging the revision control system for this
does seem sensible. However, if your dev machine is being backed
up, you really don’t need to use your revision control system for backups.

Both of these approaches lack some fundamental development benefits I fear you’ve overlooked.

These aren’t backups. They are commits. They may be commits of works in progress, skeletons, etc., but they are you -committing- and annotating artifacts of your process. These commits offer visibility to other members of your team, a record of your thinking, and, as I mentioned above, a safety net allowing you to roll back to a state you previously explicitly declared useful.
When you have a well-appointed development infrastructure, these interim commits offer a basis for automated testing tied to checkins.

No, you might not want to know about every regression you’ve introduced in the interim steps, but if you check in immediately after a successful compile while you continue your thought, you’ll be able to check the results of a background test suite. When you have this policy tunable on a per-branch basis, you have a continuous dashboard view into the progress of your work. This is especially important when you’re doing a resync merge of the mainline or parent branch down into your dev branch.

Whether or not you run a full unit and regression test suite, the success of a triggered automatic build on checkin after a succesful local build at -least- gives you confidence that all of the necessary artifacts for a build are in the repository, and a clean checkout into a pristine build environment produces results equivalent to those in your local sandbox.
As a corollary to my commits point, above, periodic backups capture a state in time, presumably scheduled, whereas -commits- capture an explicit, meaningful state in your process. Do you care about restoring what you were working on an hour ago, or what you were working on before you broke what you broke?
Periodic backups, depending on how you implement them, and editor backups, almost universally, still leave the history on your local machine. This means you don’t get any of the durability advantages offered by your SCM hosting platform. I.e., if you’re on an enterprise SAN, or even a beefy server, you may get local and remote replication, automatic copy on write snapshots of the repository, etc.
Apropos finding the broke what you broke point, adopting this continuous commit practice makes Binary Search Debugging (http://www.joelonsoftware.com/news/20030128.html) not only easier, but, fundamentally, -possible-.

These benefits do not solely accrue to those using SCM on teams. My rigorous SCM discipline has paid off throughout my (considerable) history as a lone gunman. We can sometimes be our own worst teammates, and these practices help protect us from ourselves.

David_W · August 21, 2008, 12:00am

A nice article, but it applies mostly to centralized version control
systems. With DVCSs like hg and git you can pretty much have your
cake and eat it too. You can check it broken code as much as you want
so that the history is recorded and you can revert when you want and
then when you feel the code is good and won’t break the builds, you
just push it upstream.

That’s actually the big disadvantage of distributed version control systems – it is so easy to leave the pack and go off on your own world.

I had been a ClearCase administrator for almost 15 years. In ClearCase, branching is extremely easy, and merges are quite simple. The standard way you use ClearCase is to give each developer their own branch, have the developer do their work, and then merge it back into the integration branch.

With ClearCase, developers could base their code on another developer’s branch, and deliver their changes to that branch. Or they could checkout their code from the main branch, make a dozen sub branches testing a wide variety of stuff, and then deliver whatever changes they want to the main branch. In other words, you used ClearCase as you would a distributed version control system.

A typical ClearCase shop would have two dozen or so developers. Each one would check out code from the main line onto their branch, and then merge their changes back onto the main line once they were satisfied with their changes. Developers could create as many development branches as they pleased, have branches coming off of branches, and even share branches.

My job as the CM was to keep the developers in line. I ran reports looking for developers who hadn’t check in their changes back into the main branch in a few days (we actually had a case where one developer didn’t check in any code into the main branch for 11 months!) I also watched what the developers were working on. Were they making changes in class interfaces? Did everyone else know about that? Were two developers working on incompatible changes?

Worse part were release deadlines. Suddenly, developers would merge all of their changes needed for a release and nothing would work. QA was furious because getting everything to work again went against their testing schedule. At one job, the entire QA team quit en mass in protest.

After 15 years of working with the most sophisticated version control system ever invented, I went to work for a place that was using old fashion, out of date, feature free CVS. It was another shop with about two dozen developers. How, I asked, did the developers create their own branches and do the merging since CVS makes creating branches difficult and merging is poor? The answer: We all work off of the trunk.

I couldn’t believe it. What about problems with developers creating incompatible code? What about changes that might break the build? How can two dozen people share the same branch. Even worse, it’s the branch that the code is actually released from.

To my absolute surprise, it worked. It worked better than I’ve ever seen development work. Builds were done whenever a change was made. If the build failed, you had to undo your change and try again later. Developers made small changes and coordinated their work. No one tried rewriting a complete section of classes. QA got builds throughout the entire life cycle, and the software always worked.

By using a centralized source repository, the developers were forced to work together. They talked to each other, no one tried to bite off more than they could chew. Everyone checked in their code almost every day.

There was a recent study on traffic lights vs. traffic circles. On the face of it, traffic circles would seem more dangerous because of all the merging. But there are actually fewer accidents in a traffic circle than a compatible traffic light. Drivers in traffic circles have to be careful. They watch what is going around them and are more observant. Traffic lights – besides causing frustration – make people more careless because they simply assume that they have the right of way.

The same is with centralized version control systems. Developers are forced to take small steps and work closely with their fellow developers. You can’t go off in a corner and make a masterpiece of programming art. You have to work with everyone. Everyone sees everything you’re doing. You can’t hide.

So, why does Linus Torvalds prefer using something like git – a decentralized version control system? Because Linus never made a promise to a customer about what the Linux kernel will contain. If you make a large set of massive changes in your own private repository, good luck getting that incorporated into Linux.

Distributed version control systems off great power and flexibility, but can cause a breakdown in the discipline needed to produce large commercial projects with hard release requirements and specific deadlines.

tc12 · August 21, 2008, 12:00am

I’d much rather have empty stubs and basic API skeletons in place than nothing at all. I can integrate my code against stubs. I can do code review on stubs. I can even help you build out the stubs!

I tend to work by TDD, which sounds like the opposite of what you propose. The first thing I want to write is a test that fails, not an API skeleton (which is probably going to be wrong, anyway). Integrating against a skeleton API is useless, because any skeleton API today is going to be wrong tomorrow when the code is actually written. It wastes your time today (working with stubs that will go away), and wastes my time tomorrow (when I have to refactor your integration points to make any changes to my code).

Put another way, adding mass to a system does not make it faster.

Developers that wouldn’t even consider adopting the old-school waterfall method of software development somehow have no problem adopting essentially the very same model when it comes to their source control habits.

It sounds like you’re proposing the more waterfall-like system: type in the interface, and then fill it out. Besides, real waterfall has cycles (I’ve been there!) on the order of months. Waiting 2 days for my interface to settle down is a couple orders of magnitude faster than that.

Also, the people you quote don’t mention a day – they simply say often. It feels as if you’re putting words in their mouths.

Do you have actual problems with people waiting more than a day to check in? I’d be curious to hear about your team: how many people total, how many working on a feature, etc. I simply can’t imagine how checking in code every 2 days gives you serious integration headaches.

Mecki · August 21, 2008, 12:00am

I’d check in hourly - but the problem is, that our trunk is build by an automatic build server and many of my check-ins would lead to either build failures or builds that are horribly broken when being used. And there is no way to tell when a build is a good one or a bad one.

There is no risk of losing data if you make sure your code always has an external back-up. Despite the fact, that the chance of having a data loss is very small nowadays.

Okay, you should maybe not forgo a check-in for very long. If you add new features, like you said, you can already put in some stubs. But if you touch previously working code and temporarily bring it into a non-working state (and pretty much describes my daily work), checking in too often will not only cause build regression for several other developers, as our tool is very low level, it might cause everything from app crashes to system freezes if they work with my not-ready-to-use-yet code.