Diseconomies of Scale and Lines of Code

I think complexity could be better based on number of objects (doesn’t have to be oo) and number of interacting connections. Much like calculating the complexity of a neural network or other similar systems.

When I create a new class there may be lots of code, much of it filler and error checking but doesn’t actually mean that class is complex. But if that class has to interact exteranlly lots and there are lots of other classes that do the same…things get more complex and more difficult for my brain to track.

In many ways the more difficult it is for your brain/mind to understand the workings…the more complex a project is. Maybe using a more common sense approach that actually relates to what we consider complex in the rest of the world should be applied to software engineering.

Has anyone tried out SLOCCount? http://www.dwheeler.com/sloccount/sloccount.html

Lines of code is a very real and valuable metric. But here’s the twist: Smaller is better.

More lines of code - more bugs.

More lines of code - greater cost.

More lines of code - less maintainability.

If you’ve thrown away 500 lines of code, and solved the problem with 100, you have 1/5th as many bugs, and 1/5th as many headaches down the line.

This is a victory, and–in our shop–a cause for celebration. As usual, Dijkstra has a good point.

“6200 lines a year”?

This must be language dependent, either that or I need to slow down.

Philip Su may be an asshat (after all, he writes a Microsoft blog, doesn’t he?), but you’re both quoting selectively.

http://blogs.msdn.com/philipsu/archive/2006/06/14/631438.aspx

Su writes, and you quote: “Vista is said to have over 50 million lines of code, whereas XP was said to have around 40 million. There are about two thousand software developers in Windows today. [… The] typical Windows developer has produced one thousand new lines of shipped code per year”.

Su adds in parentheses, “(Yes, developers don’t just write new code, they also fix old code. Yes, some of those Windows developers were partly busy shipping 64-bit XP. Yes, many of them also worked on hotfixes. Work with me here.)”

So it’s disingenuous to quote just the first part and then attack Su’s “conclusion”. On the other hand, Su’s parenthetical also misses the point. Does Su really believe that Windows Vista was created merely by adding 10 million LOC to XP? I bet at least one-third of XP’s code (GUIs, file system innards,…) was totally scrapped and rewritten for Vista, which would make for some 15 million new LOC right there.

On the other hand, do you really believe that Vista is an “operating system” in the hacker sense of the term? “Writing an operating system is quite a challenge”, definitely, but 45 million of Vista’s 50 million LOC aren’t “an operating system”; they’re GUIs and office apps and screensavers and a Web browser and a .NET virtual machine and an API for the window manager and rules for pluralization in Turkish and hotkeys that make the windows zoom around in 3D. There is no fundamental difference between the vast majority of Vista and “a business app”.

However, none of this back-and-forth posturing would be necessary if you’d just follow your own advice that LOC really is a terrible measure of productivity! Who cares if Redmond produces 1000 LOC a year or 100,000? Not I.

I’ve found that quite a handy metric for a quick overview is to look at the bug count, in two ways:
New bugs older than X number of days: By looking at the number of new bugs which are not being fixed in say, two weeks you get an idea of how behind the project is. If you are hemorrhaging 10 or 20 bugs a week, thats not so bad but more than 5 or 10% of your projects total bug count can cause pretty big problems I think

Bugs fixed per week / Bugs found per week: By using this metric, you have the handy psychological benefit of having a goal: Because the result approaches 1, when you reach that goal it feels good. Once you get past 1, you know the project is doing well.

It’s really just personal preference, but I like having as many metrics as I can to determine how bad I am :slight_smile:

I find an appropriate way of measuring a project is in weight.
After weighing a couple files, I’ve decided that 1000 LOC = approx 28kb, I’ve used this as my benchmark of measurement.

This method doesn’t discriminate between how you space your braces, or how much commenting you have - in fact, I would consider excess internal documentation a good thing, an indication of a more manageable and quality product.

Just make sure not to weigh any media too :slight_smile:

A small personal project of mine was 40kb (1000 LOC). compared to a larger 192kb (7000 LOC) personal project, there’s not too much of a jump in complexity, I’d mainly attribute it to an abundance of internal documentation in the larger project.

But when compared to the 1999kb (71,000 LOC) project at work - which has lax internal documentation, and was written by somebody else - I would say it is an order of complexity more difficult.

I would consider a day where I go through my code and heavily document it - doubling my LOC - a great day.

In summary, I find it hard to justify LOC as a good measure of anything - size, productivity, functionality, or complexity.

Doesn’t anyone here have any actual training in metrics? Those of us who have been doing this for a while - and were properly trained - know that LOC, as bad as it is, does work. What you’re all dancing around is the fact that, like any metric, IT MUST BE CALIBRATED to the environment.

BTW, a line of code is a line of code is a line of code, as long as you count consistently: I once had a project where we counted the lines three different ways to keep the lunatics upstairs happy, but, once each measurement was calibrated, they yielded identical results across products AND, strange as it seems to the uninitiated, even across languages. (The catch there is that the higher the level of the language being used, the less lines of code required to produce the same functionality - see the works of Halstead for a complete explanation.)

As far as using LOC and the limitations thereof, refer to Barry Boehm’s work (the CoCoMo guy).

If you’re a Function Points fan, just try to compare an embedded system’s size to that of a typical COBOL app using the FP counting rules - it can’t be done. BUT, it can when using LOC and full-blown CoCoMo.

Bottom line on counting: it doesn’t matter how you count, as long as you consistently count the same way. Count comments or don’t; count white space or don’t; count declaration statements or don’t; count multiple line statements as one statement or don’t. As long as you do it the same way, it can be calibrated and used effectively.

HOWEVER, the BEST way is to maintain appropriate detailed records of productivity and refer to them. Just remember, you’re asking for an ESTIMATE, not the final numbers. And ESTIMATES can vary as much as 16x at the beginning of a project.

Better to have a rock solid development process that runs the amateurs off and keep your software folks under control so they don’t introduce unneeded functionality or experiment more than necessary.

If you want a challenge: count non-blank characters instead. Then you can argue over whether or not long variable names cost more than short ones. They can easily be shown to cause more compile errors due to typoes, but does the maintainability increase offset it?

Folks, this is a tempest in a teapot: use what works for your organization. Don’t waste energy arguing over which end of a soft boiled egg is the right one to eat it from!

A few years ago I took on a component of a large project and removed 90% of the lines of code with the result that it was 10 times faster (and 95% less memory-hungry, and far more robust). This improvement meant it was no longer necessary to increase the cost of the embedded hardware it ran on by adding RAM.

Does this mean I should have been fired?

It’s all features and value to the end user, and the one who does it in the fewest lines of maintainable code wins - Jon Galloway

So if none of the lines of code I write are maintainable does that mean I win?

Tim,
In our company we take the absolute size for Add/Modify/Delete. But there is a weightage factor of 100% of new Added LOC, 67% for changed LOC, and 40% for deleted LOC. As per our experience this provides better results.

Pranay

Steve, can you please let us know what is the source of your
[Using software industry productivity averages]?

Thanks
webmaster

“”“
For one thing, different languages vary widely in the number of lines of code they produce. 100 lines of Perl will probably accomplish a lot more than 100 lines of C. So you have to be careful that you’re really comparing apples to apples.
”""

I read a study somewhere that says the ratio of bugs per lines-of-code stays surprisingly constant irrespective of the language that is being used. This is a great reason to use languages that let you get more done in less LOC, such as Perl, Python (my favourite) or Ruby. Annoyingly I can’t find a direct reference to cite on this.

Here’s a related idea, again with no formal source though:

http://www.artima.com/intv/speed.html

“”“
This is all very informal, but I heard someone say a good programmer can reasonably maintain about 20,000 lines of code. Whether that is 20,000 lines of assembler, C, or some high-level language doesn’t matter. It’s still 20,000 lines. If your language requires fewer lines to express the same ideas, you can spend more time on stuff that otherwise would go beyond those 20,000 lines.
”"’

I’m going to go out on a limb and say Lines Of Code are completely irrelevant. It’s the sort of statistic you’d use to measure something you didn’t understand - like phrenology or something. We’ve got no idea how to track how much business value we create per hour, so we’re going to track a statistic that - though irrelevant - is easily quantifiable.

I work hard to write as little code as possible; does that make me less productive than someone who re-implements logging and user authentication in systems that natively support these features? It’s all features and value to the end user, and the one who does it in the fewest lines of maintainable code wins. That maintainable part is important - I don’t believe that 20,000 lines of assembler is nearly as maintainable as 20,000 lines of C#, mostly because the people who can write good assembler code are too busy being dead.

The scale is definately not linear as the number of lines of code increase.

I don’t think LOC is an accurate metric. You could have 10,000s or lines of code for just doing mundane tasks, like adding parameters to the 1000s of stored procedures your system has.

Anyway, the project I am working on has 619,000 lines of csharp code (just the .cs files), not counting the aspx files and the many stored procedures, so counting everything it probably close to a million.

More useful tools are code complexity analysis tools. I ran one on the code base and many files were over 20 in complexity (cyclomatic) with the grand prize being 338! Not thats some serious spaghetti code! I think Jeff should organize a contest, most complex (in production) method in production gets a T-shirt. Everyone would have to use the same tool to do the analysis. Winner (if it can be called that) gets a free coding horror t-shirt.

After I ran the tools, I commented to a fellow employee that the code that I inherited will always have bugs in it, no matter how many we fix, just due to the nature of the amount of code in the system and the way it was written. The wasn’t clearly understood because no one had taken any time to analyze the current code. Future enhancements and projects on this code base have a high percentage of failure due to the current state of the code.

I think any software project should strive to:

Keep number of lines of code to a minimum
Keep complexity of code to a minimum (using standard metrics)

I bet you’ll find that most projects fail because there is too much code and it is too complex to manage and update (Bad design practices, nobody knows how it works, etc).

Simon Willison,

Any numbers on error rates if the programmer manages more or less than 20,000 lines of Code?

First guess: 58 to 74 bugs in that 20,000 lines of code.
Odds the 20,000 lines are defect free: Trace (1.9e-29)

Ave_bugs = 20,000 lines of code * 0.0033 professional blunder rate = 66
Sigma_bugs = sqrt(20,000 * 0.0033 * (1 - .0033)) = 8.1
Defect_free = (1-.0033)**20000 = 1.9e-29