How Good an Estimator are You? Part III

For the final installment in the How Good an Estimator Are You series, I'd like to start with an anecdote from chapter 7 of Software Estimation: Demystifying the Black Art :


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/07/how-good-an-estimator-are-you-part-iii.html

I thought you said not to go to google to find the answer =oP

You shouldn’t search for the answer, but I think it’s fair to search for related facts that help you estimate an answer-- such as “what is the population of Chicago?”

For example. On the original quiz question of “number of book titles published in the U.S. since 1776”, I think searching for the US Census population is fair. You could then estimate # books per person published every 10 years at some fixed rate.

As the son of a piano tuner, Jeff, I have to point out that adding mathematical operations to guesses does not make for a valid estimation, and there are a huge number of variables which will skew your result by at least an order of magnitude.

  • One in five families do not own pianos. Not even close. The number of piano owners is far lower than that for families.

  • 1,000 pianos in a year is a fairly ambitious goal for any piano tuner, considering that each piano can take up to 2 hours to service, and that piano tuners travel from piano to piano, introducing the element of commute time. Four pianos in a day, with a 30 minute drive between each site, plus a lunch break means that these piano tuners probably don’t have families.

  • Most pianos are not owned by families, but rather by schools, churches, and musical societies. Any measurement of how many piano tuners a city can support must necessarily include an estimate of the market for the arts, evangelical/charismatic religions (i.e., religions which use music in their worship), as well as the level of inclusion of music within the local school districts’ curriculum (which must also include the ratio of charter schools to public schools, and how well-funded the school systems are).

  • Many school districts in urban areas employ full-time piano tuners. The San Francisco Chronicle featured a story a month or so ago about the tuner for the Oakland school district, who was responsible for the 500+ pianos owned by the school district.

Luckily, we have a better option than starting from the population of Chicago and bluffing our way to a nice, round base-10-ish number. Piano tuners advertise: (a link to Chicago’s Yellow Pages listing for piano tuners, blocked for questionable content)

Now you just have to figure out how many piano tuners the average piano tuning business employs, and factor in those tuners in full-time jobs.

Or you could check for how many people are card-carrying members of the Piano Technician’s Guild, Chicago chapter: http://www.ptg.org/chapters.php?Chapter=601action=list

(That’s a better estimate of the number of folks in Chicago you should trust with your piano: 37.)

So I’m going to go with 50 as an upper bound, including unregistered, apprentice, and part-time tuners.

Guesstimation is no match for measurements and a decent knowledge of the problem domain.

(If you had said plumbers or tree doctors or mice, I totally would have passed this over. :wink:

Guesstimation is no match for measurements and a decent knowledge of the problem domain.

Coda hit the nail on the head.

He makes a great counterpoint to Jeff’s (Fermi’s) example of where you can fool yourself that by following a process of divide and conquer (and guess) that your estimate is somehow relatively accurate.

Multiplying even a few uncertain numbers together quickly leads to a nice looking number that is essentially meaningless.

And even historical metrics from an organization are of questionable value: has the team changed? are they using different tools? is the architecture different? has the development process changed? etc… Are any of those historical metrics even relevant anymore?

I’m not saying estimation is futile, but I think “back of the envelope” shortcuts that give the perception of applying a disciplined process probably fare no better than numbers out of thin air.

The only approach I’ve used that held any sort of promise involved following another of McConnel’s best practices: Mini Milestones (from Rapid Development). You break a project down into tasks that take 2 days or less and each task is either incomplete or complete (never 70% done or something vague like that).

Each developer in the group would do this independently and then we’d go through each plan and merge them into one “grand unified plan”. When our estimates varied wildly, we’d talk through our assumptions (that’s usually where the variation lied) and work to consensus.

And then, we’d vastly miss our target because in the environment we worked in (a hedge fund) it was impossible to get long chunks of uninterrupted time to actually get anything done.

Oh, well.

Multiplying even a few uncertain numbers together quickly leads to a nice looking number that is essentially meaningless.

Not necessarily; Coda is proposing better data points to base our estimate on. The Fermi method is only supposed to produce an estimate within an order of magnitude , eg, 50 piano tuners vs. 500 piano tuners.


Fermi was known for getting quick and accurate answers to problems which would stump other people. The most famous instance came during the first atomic bomb test in New Mexico on July 16, 1945. As the blast wave reached him, Fermi dropped bits of paper. By measuring the distance they were blown, he could compare to a previously computed table and thus estimate the bomb energy yield. He estimated 10 kilotons of TNT, the measured result was 18.6. This method of getting approximate and quick answers through “back of the envelope” calculations became informally known as the Fermi method.

Obviously if you have better, more accurate data points, you use them to base your estimate on. As a project progresses, for example, you use the real data gathered during your project (versus the Fermi-style estimates you have at the beginning) to estimate the rest of it. That’s another way to narrow the cone of uncertainty.

Okay, so fiddling with bits of paper while your colleagues are placing bets as to the general flammability of the atmosphere vis a vis a nuclear explosion gets you within an order of magnitude, more or less. I’ll buy it–an order of magnitude is a big target.

What I’d like to know is how this stacks up against any other method of estimation: totally uninformed guess by a single person, totally uninformed guesses by groups of people, throwing dice until you get a one, numerology, and, of course, a cryptographically secure pseudorandom number generator choosing both values and units of measurement.

I suppose I’m not content with estimating the estimative accuracy of an estimation methodology; I’d like to measure it down to a specific confidence level. And, because I expect the universe to be a profoundly weird place, I’m putting my money on the CSPRNG. (Though I may be an order of magnitude off.)

But more seriously, I’m interested in seeing you play out an example of this using something which is a bit more real-life, Jeff. I’m a horrible estimator (thus my love for measurements), especially when it comes to how long it takes to implement a specific set of functionality. What kind of numbers would you put on the back of your envelope there? How would you smash them together? How much confidence would you place in the result?

(Even then, I doubt my employer would be satisfied with “Well, it’ll take anywhere from a month to ten months to implement this. I’ll let you know in two weeks,” seeing as how he hasn’t been terribly receptive to “I haven’t the faintest idea. Let’s give it a shot.”)

It’s definitely an interesting subject, and it forms the basis for a lot of the assumptions of various development methodologies.

Notwithstanding Coda’s comments about the accuracy of the variables, the estimate is fatally flawed in my view because it ignores a key variable entirely. If you say:

(total pianos) / (tunings per year per tuner)

The answer is how many tuners per year you’ve got. (Did I say that right?) Let’s say a typical chicago piano gets tuned once every 3 years. Your estimate will be out by a factor of 3.

Same estimate two ways. Interesting.

Well, I don’t think just because two different procedures arrive at similar values is any reason to think that the procedures themselves are good. Let’s say my procedure to estimate the number of piano tuners is to count the number of stars on the American flag.

Same estimate three ways. Random.

What people are missing here is that there is no real science in software engineering. Yes, you can observe the distance bits of paper are blown in a blast and work backwards to an estimate of the intensity of the blast, but can you really take the number of function points (what exactly are these again?) from a project your firm did two years ago, triple that for some different person’s measure of function points on a new project and come up with an estimate that has any meaning at all?

Have we learned nothing from Brooks over the last 30 years?

Interesting convergence:

Applying Dominic’s factor due to tuning less often than once per year to the original number: 150 / 3 = 50 tuners.

Coda says, after using a rather different approach:
"So I’m going to go with 50 as an upper bound, including unregistered, apprentice, and part-time tuners.

Guesstimation is no match for measurements and a decent knowledge of the problem domain."

Same estimate two ways. Interesting.

Any discussion about estimating has to reference “The Wisdom of Crowds” http://www.randomhouse.com/features/wisdomofcrowds/

So the more developers there are the better the project estimates should be…

there is no real science in software engineering… Can you really … and come up with an estimate that has any meaning at all?

That single project represents only one data point. What you really want to do in order to more accurately predict is solidify the firm’s software process from beginning to end and then measure as many aspects of as many projects as is practical. Applying statistical analysis will then give you a reasonable estimate which would eventually yield excellent estimates of both the minimum and maximum time as well as probable minimum and maximum times. (4th standard deviation and 1st standard deviation)

As far as your science comment goes, that could be argued successfully from either side, however; software development is a relatively new thing. The goal of Software Engineering is to make delivering products as predictable, manageable, and streamlined as possible. This is essentially the same goal as all other engineering disciplines. The only difference is that a software project doesn’t end up with a building or a chemical plant or an airplane. It ends up with a payroll system, a customer accounts system, a space shuttle control system or a supermarket point of sale application. An engineering discipline harnesses the knowledge gained from past projects (statistical analysis) coupled with advances gained through research (science) to ensure success.

Of course, this approach banks on the fact that the past is a decent indicator of the future, something that is generally true but may not be for a particular case. If a software house does not experience huge shifts in management philosophy, culture, personnel or software process; statistical analyses should provide useful estimates.

Finally, to beat this thing to death, here’s an interesting thought experiment: If you were to start constructing buildings today, how many buildings would you have to build before you were able to accurately estimate the major metrics for an upcoming contract? Major metrics being: Labor, Materials, Date of Delivery. How would you go about refining your estimates? If your firm expanded, how would you ensure that your employees with less experience than you could create decent estimates? Since software development firms are just beginning to specialize in the same way construction companies are, assume that most projects you take on will be different in key ways from what you’ve done already. How does that change what you would do?

15 years ago, I had a french teacher in methodology which taught me a good technique: Take any method of the literature then multiply the estimation by 2.
You will thus have the minimum duration of your project.

He called this method “le pifomtre” something like “the randometer”…

A friend of mine must have had the same methodology teacher. His technique for estimation was this:

  1. Make your best estimate.
  2. Double it.
  3. Move up to the next larger unit of measure.

Thus, an estimate of 1 day becomes 2 weeks :smiley:

Alfred Korzybski (33) - the inventor of general semantics - explained the problem in its time well. When we study a question which leaves our field of competence we let us proceed by inference (generally accepted ideas) instead of reasoning analytically.
One of the basic ideas consists in noticing that between the object or the phenomenon which we perceive, and the idea that we are made of it, intervenes a cascade of abstractions, during which, without our knowledge, we remove properties of the object/phenomenon.
A very useful tool bequeathed by general semantics is the language E-prime.
I frequently use it during my work of analysis/estimate. It consists in removing the use of the verb “to be” to avoid the simplistic and reducing identifications, like: Such thing “is” like that or “is not” like that.

This website has a formal proof related to the difficulty of software estimation
http://scribblethink.org/Work/Softestim/kcsest.pdf

It’s something we all know, but has been stated mathematically.

Joe: multiplication is associative.

you should avoid estimating what you can count
Similarly, you should avoid cointing something just because you can.

There are two maxims in business:
You can’t manage something you don’t measure.
What you measure gets attention.

They aren’t a reverse of eachother. For instance, the second indicates that even if you don’t care how many lines of code each developer creates, if you record it (and developers are aware of this), their coding style will change (to either reduce or increase lines). Neither reducing nor increasing the length of code actually adds to the value or quality of a feature, so that’s just silly (but counting lines of code has been a standard in decades past).

You make an assumption (that may or may not be true, I have no idea) that pianos only need to be serviced once a year.

Handforged

Wouldn’t it be:

  1. Make your best estimate.
  2. Move up to the next larger unit of measure.
  3. Double it.

Thus, an estimate of 1 day becomes 2 weeks :smiley:

Swapping 2 and 3 gets you two weeks.