Let's Play Planning Poker!

From what I’ve read about planing poker, having around 7 players is around the sweet spot.

I’m on a team of only 3 people. We tried a set of planning poker, and my teammates pretty much just went with what I picked (I’m the “lead”). One guy is fairly new, so he wasnt able to contribute much. That leaves only 2 people discussing the numbers. If the “moderator” didnt participate, that’d leave 1.

Any thoughts on what we could do to improve our estimates? I thought of just pulling in people from other teams, but they’d just be pulling numbers out of their whatsit.

We use a formula
Actual Time + Worst Case Scenario Time / 4 + Actual Time

It actually comes really close

"You want someone to say between 10,000 and 12,000 for instance because you can use this answer.

I realise it could be wrong. I think i am missing the point here.

Surely an estimate is a narrow range and a good estimate is a narrow range that is also pretty accurate. That comes with experience."

You are missing the point. An estimate that does not contain the correct answer is useless. We’ve been trained to see a narrow range as “good” and broad as “bad”. A narrow range isn’t accurate, it’s precise. And precision is useless if you are precise about the wrong thing.

Let’s use your sun example. Using the first estimate of 100C - 1,000,000,000C to build a ship to travel to the sun will yield a ship that can withstand the heat. Because we’ve built it to withstand temperatures from 100C to 1 billionC.
Using the second example, if the temperature of the sun is 20,000C, your pilots fry because your ship won’t be built to withstand the temperature because you worked under the assumption that the sun will never go above 12,000C.

Same thing for estimating project time. If you are unsure, use a really broad range (1day - 1week) to convey that uncertainty. That way when you come in at 3.5 days, no one is surprised and you are now ahead of schedule.

This is not a signal to reduce the estimate however. If you reduce all of your estimates by half due to this one data point, you are now going to be extremely late. Because those items that were going to run over their estimated times are now going to really run over.

Rule of thumb is to cover 80% of your certainty with your estimate. You should have about a 10% chance of making your low estimate and a 90% chance of making the high one.

Your end schedule should have about a 90% chance as well. And estimate as finely as you can. If a task can be broken up into two discrete sections, estimate each individually.

The finer the pieces, the better the estimate. The more data you collect, the better you will be at estimating.

It’s all in the book.

“Delphi” is almost certainly a reference to the Delphi Method, not the Delphic Oracle. Using the Delphi method, a number of individuals gives separate and independent opinions, get feedback on the views of others, give opinions again, and eventually converge, without ever having any face-to-face confrontations. It’s a good heuristic for reaching agreement using impersonal debate, and avoids problems like groupthink.

The Delphi method supposedly dates back to about 200BC when some king wanted biblical writings translated into Greek. The king got a bunch of scholars, locked them in separate rooms, and had them prepare translations. When they were finished, they were all identical. The king was amazed, but a rabbi who looked at the translations said that getting n identical independent translations wasn’t a miracle, getting n scholars in the same room to agree on a single translation would be have been a miracle. This is probably just a good joke rather than the real origin of the method, but it’s a nice example of how it works.

“Agile Estimating and Planning” by Mike Cohn also does a good job discussing Planning Poker. Cohn also sells card decks on his web site.

It’s important to make the distinction between estimating stories based on relative size and estimating the hours a task will take. Size is “fixed” – a pile of dirt is the same size no matter who is shoveling it, but one person will take longer to shovel the pile of dirt than someone else.

I am surprised no-one mentioned use case estimation models?

In most pro services shops, that I have worked for anyway where fixed price estimates are the rule, software requirements are typically documented via use cases, which feed into one of these use case estimation models that can be tuned for complexity, technical factors, team experience, history, etc. Once “calibrated”, it turns out to be reasonably accurate. Reasonably being the key word.

Most importantly, it becomes a predictable and repeatable process. And the last time I played poker, the only thing predictable or repeatable about it was losing my you know what :wink:

You want to see the level of detail in collecting time and size measures? Download Watts Humphrey’s Personal Software Process (PSP): http://www.sei.cmu.edu/pub/documents/00.reports/pdf/00tr022.pdf

Interesting timing… I was looking at the Evidence Based Scheduling feature of FogBugz just yesterday – read some of the forum entries, and watched the video, too.

The thing that I didn’t see addressed was task dependencies. What if some of Jane’s tasks depend on one of Milton’s being complete? This clearly means that Jane’s completion date isn’t solely based on the accuracy of her estimates, or even the estimates after adjusting for historical data. She simply can’t start until Milton is finished.

While this may not be the right forum in which to ask, but does anyone know if the FogBugz EBS feature addresses task dependencies?

Tighter estimates are better, right?
http://blogs.msdn.com/philipsu/archive/2006/06/14/631438.aspx

As part of the agile process there is the concept of ‘velocity’. This is based on historical data and basically uses the results of the planning poker to determine how bad we are at estimating.

So after a few iterations we can see that when the group predicts that something is going to take 5 days, then actually it will take 8, or some such.

This is a nice, simple approach to getting historically informed estimates. The thing it doesn’t deal with that well are blow outs (where are single piece of work takes considerably longer than expected) or dependencies (but from a statistical point of view I don’t think that should really be all that important.

FogBugz estimating doesn’t support dependencies. They have a story about why dependencies don’t matter for software development estimating.

If you want to build a space satellite, you need to worry about dependencies, create Gantt charts, and find the critical path. But for software development, you can probably assume that no one on the team will stand idly by waiting for the dependencies of “their” part to be satisfied. Just break the entire project into parts no longer than two days each and add up the number of days. At least that’s the argument. Probably a good argument for a seasoned team working on the next release of a product.

I enjoy planning poker and have seen a lot of benefit from it when the entire team really participates. We all suck at estimating, and we all have some historical data from past tasks that have been completed. Where we get into trouble is that just like the stock market, past performance does not necessarily reflect future results.

A future task often involves new approaches to an old problem and may take much longer than previous approaches until the developer becomes more comfortable with a new technology. Indeed, discussions often turn to asking why a particular task exists if we find that it is easy to estimate based on historical data. The team often wonders if such a task is necessary at all.

Another influence on poor estimation I notice is the human aspect. Some historical data may have been generated during a time when the developers were particularly “on fire” and they may not be as energetic, motivated, or diligent in the future, or vice versa. Even the frame of mind at the time of estimation may come into play and how someone feels can dramatically influence the estimation. The developer may be in a very different mental state when the task is actually performed. Planning poker helps mitigate these all-too-human influences.

Finally, I see team members lose interest in estimating at all when features are added to a project as quickly as they are completed. We see velocity and project completion dates actually diverge and the attitude quickly becomes one of just get it done. When I see this happen I find more and more of my time is devoted to ensuring that historical data is still collected as accurately as possible especially as it predicts future ship dates moving to coincide with Duke Nukem Forever."

“Collective Intelligence” is a key benefit of Planning Poker. In "The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, "
James Surowiecki does a great job of discussing the many ways that we are collectively better at some tasks than we are individually. For example, there is the phenomenon that when guessing how many small objects like marbles are in a jar, the average of all the guesses is often more accurate than any individual guess. How often has estimation felt much like guessing how many marbles are in the jar?

I’m just going to chime in with Mitch’s earlier reference to the PSP text. While no where near as delightful to read as Steve McConnell’s work, it does delve deep into the math. I worked on a team that used it for 4 years, and it didn’t take long for the results to improve.

Our management used MS Project to handle task dependencies, and as Howard pointed out–there’s always something to do, so keeping everyone busy was rarely a problem. In fact, it turned into growth opportunities for some, because there was time available for them to take on a task more challenging than some they were originally scheduled.

This stuff works. Most people balk at collecting the data. Tools help, but as Jeff said, it still takes discipline. But, you might be surprised how many fewer times your boss interrupts you when your data shows those interruptions (or the tasks associated with them) are what caused your task to be late! (Or, sadly, if you live in Dilbert’s world, the boss might say to stop tracking your time like that, because it only shows you can’t get done the things he wants you to do.)

A friend once told me that to estimate the time of any software project, take your best guess, double it and move it to the next time unit up.

So, two hours become four days and a day become two weeks, get it?

“OK. In order to have 90% confidence that the result will be in range, each estimator plays both their 1 and 100 cards.”

Read step 4. You only “play” a single card, which you think best represents the size of the problem.

Also note that relative sizing is typically used: playing a 2 doesn’t mean two days. It just means you think this task is bigger than the 1 tasks, but smaller than the 3.

If each estimator plays a single card, then no estimator has a 90% chance of being right (on average). Even if just trying to compare the expenses of two tasks instead of trying to estimate the orders of magnitude of difference between them, still no estimator is going to get 90% of them right (on average).

OK, throw out yesterday’s book on estimating and buy today’s book. Tomorrow, we’ll throw out today’s book and buy tomorrow’s book. As much respect as I used to have for McConnell’s ideas, I’m sure glad I didn’t waste money on that book.

We tried this just the other day in our five developer shop.

My first impression was that it was kind of hokey, but as we went along I came to like the process. There really is a ‘wisdom of crowds’ effect. And its fun.

If you’re curious about how FogBugz handles estimation, you can watch the Austin leg of Joel’s tour here:

http://www.joelonsoftware.com/items/2007/10/24austindemo.html

I attended the Emeryville, CA presentation. It was great to meet Joel in person, and I was very impressed with FogBugz-- particularly the estimation and timesheet stuff.

Joel goes into more detail on evidence-based scheduling:

http://www.joelonsoftware.com/items/2007/10/26.html

This technique is common in the Agile world, and is usually used with story points as the unit, rather than any unit of time. The rationale is that developers are better at comparing complexity of tasks than estimating the time they will take to complete them.

Once you have a couple of iterations under your belt, you can start looking at the velocity (story points/iteration, say) and have a look at where you’ll end up. You can also use the best and worst 4 (or so) iterations to get a range of delivery dates, if you wish. It’s also then relatively easy to predict the effects of feature creep, and the like.

However, I’m not so convinced of the value of historical data. Story points measure the relative complexity of a set of tasks. So unless you’re always comparing to some sort of standard, I would have thought a history of estimates, progress etc on other projects would be of limited use.

As indicators that can be used to improve your process? Possibly. To show the effect of good and bad decisions? Sure. But as a basis for estimating other projects? Hmmmm, dunno. Jon make some good points about this.

However, I am completely convinced of the utter uselessness of Gantt charts for anything more detailed than product roadmaps. They just cannot model software development processes accurately, especially anything remotely Agile.

Worse, they generate useless, yet comforting, information, especially with regard to critical paths. In a well designed piece of software, there just aren’t many actual dependencies. If there are, it isn’t well designed. So usually, any “critical path” a Gantt chart shows is mostly an artifact of the order you decided tasks should be done.

For software projects, bin MS Project, it’s completely counter-productive.