Diseconomies of Scale and Lines of Code

Steve McConnell on diseconomies of scale in software development:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/07/diseconomies-of-scale-and-lines-of-code.html

I think LOC can be useful when you use it to gague relative measurements.

As you said, it wouldn’t make sense to compare a C project with a Ruby project. But when comparing C to C, then it’s more valid.

Perhaps the LOC should be qualified.

  • LOC (Ruby)
    or
  • LOC (Scripting) vs LOC(Managed)

Something like that.
Phil

Disclaimer: I think Philip Su is an asshat. In my opinion, he’s more dangerous to Microsoft than the mini-msft blog. Look closely at what he said in his Vista post that I linked above:

http://blogs.msdn.com/philipsu/archive/2006/06/14/631438.aspx

Vista is said to have over 50 million lines of code, whereas XP was said to have around 40 million. There are about two thousand software developers in Windows today. Assuming there are 5 years between when XP shipped and when Vista ships, those quick on the draw with calculators will discover that, on average, the typical Windows developer has produced one thousand new lines of shipped code per year during Vista. Only a thousand lines a year. […] realize that the average software developer in the US only produces around (brace yourself) 6200 lines a year. So Windows is in bad shape – but only by a constant, not by an order of magnitude.

The chart McConnell provides in Software Estimation defines a COCOMO average of 1,600 LOC per staff year for a 10 million LOC project.

So I think 1,000 LOC per staff year for a 50 million LOC project is, at the very least, average. I suspect, however, it is above average.

And writing an operating system is quite a challenge, among the more difficult of software projects, which would naturally have a lower LOC than, say, a website or line of business app.

So, in summary, Philip Su is an asshat.

But what I really want to focus on here is how you measure a project’s size. What’s big? What’s small?

I build internal web apps and I’ve never really thought about what tells me if a project is big or small. But I know it’s not lines of code.

The two things I use to gauge the size of my project is 1) data and 2) people.

People probably out ranks data, because the people tell me how many ways they need to use the data. If there needs to be half a dozen different interfaces, thats a bigger project because that ends up being the most time-consuming aspect of development. A smaller use based means smaller.

Data size (and nature) is the other factor for me. Obviously, the more data there is the more coding has to be done to manage and interact with it. But the nature of the data also tells me something about how careful I need to be with the data.

I’ve never really thought about how many lines of code any of the projects require. I’m now interested enough to look at some and see just how outrageous it is.

Of course, as they are web apps, it comes back to your question as what is a line of code? I write multiple languages in a single file…CSS, HTML, JavaScript…shouldn’t each be considered a line? If so, then that just triples my LOC count.

Great topic of thought…

The ternary operator is an excellent case for showing the limitations of LOC in estimating ‘code length’.

One thing you quickly realise with Software Metrics is that they are only as good as the measurer. One must accept the limitations of each metric and interpret the measures with respect to those limitations in order to draw some useful conclusion on the notion of ‘Size’.

LOC is an obvious ‘length’ metric but if it is used by itself it reveals little on complexity, functionality, redundancy and reuse within the code, therefore it is limited in its usefulness in attempting to understand Productivity, Effort and Cost which are the main drivers behind Project Size.

And then there are also those artifacts which aren’t as obvious but drive up Size such as Specifications, Design documents etc. There isn’t even a great concensus on how these are measured as yet.

You’re totally right, Philip Su has blatantly compared Apples with Watermelons.

I think a project’s size could be better determined by the number of decisions made, similar to the O() function.

Determining how many hundreds of decisions are made not only in the code, but during design could be a useful factor in measuring complexity. It would take some doing to really take this idea and flesh out a useful metric, but it’s got more basis than merely lines of code.

Josh, that’s Cyclomatic Complexity (McCabe) and is quite useful. However it is less useful if you have a lot of case statements, and it doesn’t cover everything that makes OO code complex very well. But it is definintely useful IMO.

Keeping it small certainly reduces your chances of having an outright failure, but it also reduces your ability to do much.

Anyway, in the past I’ve used number of forms, batch jobs and reports, with a scale from easy/regular/hard/wicked, to work out the size of a particular project. But that only works when you understand the development tools and production environment extremely well, and there is only one or two ‘wicked’ forms, reports or batch jobs.

I’ve used that methodology with web and client/server applications, but it does tend to break down when writing a GUI-free tool, such as a web service or DLL.

LOC is interesting in hindsight, but impossible to predict in practice, and often hides the real complexity of a particular solution which may have required extensive work to refine it down to a few lines of code, or the simplicity of a tedious, but easy to create procedure with thousands of lines of ordinary drivel (like variable assignments).

When I worked in a consulting organisation, we found it most useful to identify whether the team really understood the business area, and its experience at solving similar problems. Anything new will take an unpredictable amount of time, because (by definition) you don’t know what problems might be lurking around the next corner.

The number of internal dependencies in the software is presumably the critical factor, I’d have thought, and is also unlikely to increase linearly with number of lines of code.

The other problem with big projects is integration - you’re constantly having to make sure that new APIs etc. that you are write are backwardly compatible as that’s in practice the only way to integrate new code if you’ve got large numbers of people working on a project. That can easily take as much effort as actually writing the new code in the first place.

Question: how many of the LOC in Vista is “operating system”, “shell”, “web browser”, “games”, “utilities”, “video editing tools”, “DVD/WMA/MP3/MPG/AVI/etc player”, etc, etc, etc?

No, seriously, when Windows XP shipped with an entry-level movie editor application it became hard to call it a single product.

If I was to sit down and plan to write the complete Vista distribution from scratch there would be a lot of lines of code to write. If I was doing one little section, then there’ld be many fewer lines. (Hey, I’ve got an ideas for managing complexity - break up a large project into many smaller projects that are complete in their own right, and layer more sophisticated functionality on top of the earlier projects. I wonder if I could patent that non-obvious business process?)

(Incidentally, when counting LOC, it’s traditional to use a tool that processes the source code using standard formatting, so different brace styles and commenting patterns don’t let developers artificially inflate/reduce their LOC count.)

It seems to me that the general notion of complexity, as noted several times, is probably the real measure of the size of a project. Numbers of objects in an OOP system, perhaps factored according to the (average?) number of members. The number of dependencies – which entities interact with which other entities; presumably the more inter-entity commmunication, the more complex and/or bigger the project. The number of functions that the project’s application accomplishes, if there is some way to measure that. Stuff like that, as per what Josh and MoZ are talking about.

Something about the “order of magnitude” comparison between 1000 and 10,000 LOC bothers me; seems too linear. “Size” as a square of the difference? Cube? Is there a logarithmic increase in complexity as a function of LOC?

Mike; in regards to the number of dependencies (inter-entity communication) we have various metrics that attempt to measure the amount of Coupling in programs. A good example could be the Coupling between Objects (CBO) that Chidamber and Kemerer developed in their Metric Suite for OOD.

Another attribute that I find interesting is Cohesion of Programs (of which there are numerous ways to measure it), which attempts to explain something about the ‘relevancy’ of the various components of a system.

Chidamber and Kemerer also came up with a simple metric for Cohesion in their suite, named Lack of Cohesion in Methods. Simply put, it tries to identify where an Object is trying to do too much for its own good.

You will quickly find when using such Metrics that there is ‘No one metric to rule them all’, and the most powerful tool is your own interpretation of what the measure is, whether it be LOC or something more absurd like Halstead’s Software Science.

Do we get to count the lines of code we throw away?

I’ve never worked for a “software shop” but have always done software packages directly for the company that I’m employed by. There is no way I could judge a projects complexity by the number of lines of code, except in a very vague and useless way.

Much of my code time is of the type that I have to write communication software from one package or program to another. Lots of time, I need to refactor and throw away large chunks of code that become obsolete as my project requirements change, or as particular details of “how stuff actually works” come to light.

I think this is a classic case of measuring what’s easy to measure, rather than measuring what’s relevant.

Its hard to measure softer things like design decisions, interdependencies, complexity, so we don’t.

And when taken to the extreme, where management uses LOC as a productivity metric, then everyone just games it with code like:

if (a b)
{
c = a;
}
else
{
c = b;
}

rather than

c = a b ? a : b;

To be honest, I think its a good day when I’ve deleted lots of code. So is that negative work?

Everyone is jumping on the ‘LOC is a terrible measure of productivity’ band wagon, but the article is using LOC to measure project size.

I agree the LOC is usuless as an ongoing productivity measure. But after the dust has settled and a project is “done” I can see using LOC as measure of project size.

The article points out that a project that is 10 times as large is actualy more than 10 times as difficult to implement. This is further support showing that LOC is not a good measure of productivity.

Cheers
Chris

I’m going to go out on a limb and say Lines Of Code are completely irrelevant.

I agree, except there’s clearly an order of magnitude jump in complexity between a 1,000 LOC project and a 10,000 LOC project.

So it’s still useful for gauging relative size.

I think it’s also, if properly calibrated to YOUR project, a useful project metric. But it should never be used as the only data point on a project!

I used to write rather decent assembler code in the 80s and 90s and I’m not busy being dead! I’m busy writing rather decent C++ :wink: No, LOC is no measure of productivity and certainly no measure of quality. Perhaps in the days of Fortran 77 it was, but the free-formatting of C, C++ and so on makes LOC irrelevant in my humble opinion.

whenever LOC is brought up, this Dijkstra quote is appropriate:

“This (LOC) is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.”

– EWD1036

Lines of code means absolutely nothing. Heck, I’ve re-written Javascript on the home page of a new website I am working on 3 times in the past two days, trying to get all of my CFAjax and scriptaculous drag-n-drop code stuff working correctly. It doesn’t help that this is the first time I have gone beyond just the experimentation stage with either of these.

I might have a net of 100 lines of code that is now functioning as I wanted it to, but I might have grossed about 500+ lines between yesterday and today.

With the new “agile” software development ways of thinking, wouldn’t measuring how long it would take to meet a specific functional requirement be a better metric?

For example:

Let’s say that you need to create a drag-n-drop interface that a you to quickly llowsassign work orders to support staff. (Just bare bones stuff for now…)

Requirement 1: Have enough of a database in place so you can create some dummy work orders and dummy support staff.

Requirement 2: Create the UI that handles the display of any new work orders.

Requirement 3: Create the UI that handles the display of all support staff members.

Requirement 4: Create/learn the drag-n-drop UI code that will allow you to drag work orders to a support staff member.

Requirement 5: Write the code to update the database to reflect the assignment of a work order to a support staff member.

Requirement 6: Refresh the UI to show the results of the previous requirement.

Now, all of these requirement should take just about the same amount of time, +/- 20%. 5 and 6 are pretty fast, so you can combine…but anyways…

If you break down functional requirements into pieces that “should” take an equal amount of time to complete, you can use this as something to measure against…no?

Thanks MoZ! I knew I wasn’t being at all original, but didn’t know the term :slight_smile: