Size Is The Enemy

john35 · December 24, 2007, 12:00am

This problem is as old as the hills – well almost.
I recall coming to a similar conclusion in the late sixties, long before java etc. In those days all work was done by batch processing using magnetic tape as storage. It’s quite different now but the code problem was the same. Once written a line of code will haunt its author forever. Our average program size then was 5000 lines of low level code. As much of a maintenance nightmare as programs a hundred times bigger today.
John

N1151 · December 24, 2007, 12:00am

Yeah yeah, we get it: everything, EVERYTHING should be written in JavaScript because it’s the best programming language ever. There shouldn’t be any desktop applications other than web browsers because everything should be written in JavaScript and run as a web app.

Quit Vertigo and go work at an Initech and we’ll see how cute some of this BS sounds.

Steve · December 24, 2007, 12:00am

Sometimes it is helpful to understand WHY a language was written:

C++: does anyone know?
Java: to run small appliances
C#: to have a Java-like language that does .NET
Pascal: to teach how to program

etc.

Languages that go backwards in sophistication are written: SQL

We misuse languages. When I used 4GLs(Focus, NOMAD2) to replace what Cobol, PL/1, it was freakish how much could be done with so few lines of code, but if you didn’t know what you were doing you could abuse the CPU.

There are better languages for certain tasks. The right balance of control vs. brevity is hard.

ShantiB · December 24, 2007, 12:00am

Lol - like others, the first thing I did after reading Steve’s article was run a LoC counter on one of my rather large projects.

It’s a rails project that I maintain part-time (like Steve and his program):

Controllers: 3549 LoC
Models: 2683 LoC (64 models)
Libraries: 2629 LoC
… and a bunch of plugins I didn’t write and don’t have to think about too much, not to mention the Rails framework itself which abstracts about 80% of the CRUD gruntwork. (to my friends who still handwrite all their SQL: I look at them like Betamaxes)

Lest we not forget Mr. Yegge has a full-time job at the big G and this is just his part-time baby. If you’ve ever had a project like this that you abandon for a few months and then come back to periodically, you know how hard it is to get back into and remember how certain aspects worked. I have this problem on 10k LoC – I couldn’t imagine the painfulness of this issue on a 500k LoC codebase.
Several commenters are probably right not to place the blame on static languages entirely. Java simply has a level of verbosity that is simply soulcrushing when noticed en masse.
A commenter noticed how it can be hard to read other people’s Ruby code. I would tend to agree that this can be a problem, especially when taking over: A) some ruby rockstar’s code, who likes to show off the nooks and crannies of the language, or B) some novice who doesn’t know some of the easy-to-understand power tricks of the language.

Great recap, though.

nyenyec · December 24, 2007, 12:00am

I code in both Python and Java and it’s not dynamic typing that makes me more productive in Python (and other similar languages) but the syntactic sugar. Actually I wish I could declare the types of my variables in Python so I had proper refactoring support and the compiler would catch all the stupid mistakes I often make.

Observe the best of both worlds in the Boo language for example:

http://boo.codehaus.org/Builtin+Literals

(I wish it was not .NET specific)

Nick · December 24, 2007, 12:00am

I know that any competent programmer can easily learn python or whatever we’d switch to, but unfortunately not having thousands of people in the area that know it is a problem for management.

[I like how using a dynamic language makes the responsibilities of your automated testing larger, instead of implying that since that the static type checking did well that everything will work fine…]

MichaelP · December 24, 2007, 12:00am

Almost every argument I read for dynamic languages includes a one-line statement that would take 3 to 7 lines in a static language.
The one-liners only emphasize shortcomings in the library.

I could write a library that would make the same line of code work in any static language. It’s just that static language designers often fail to ship a well-designed API with their product. Compare GregorianCalendar to Joda (http://joda-time.sourceforge.net/).

I’m happy that Bruce can recite the one-line file parser from memory. But frankly, I can’t. I need a little guidance when I type the dot operator, so I will stick with Java because of the intellisense support that Eclipse gives me. And if I take the time to find a decent file API for Java (or write one), then it can help me type that one line of code.

AlexR · December 24, 2007, 12:00am

I am also a Python programmer, and it is indeed much easier to write, read, and debug Python code. Having written several network applications in C, C++ and in Python (in that order), I realized that I could have saved a lot of time if there was someone who told me about Python earlier.

If you’re a teacher, make sure you tell your students about this option.

I must emphasize one detail - in either case the #1 enemy is the human factor (i.e. us), as you’ve written in the last paragraph. My time management routines are far from perfect; I have several projects that have been idle for several months - and it’s only my fault.

Rick_Cabral · December 24, 2007, 12:00am

I loved the part about the Tetris pieces.

A lot of the junior programmers I have to shepherd are “satisfied” with what comes out of the box (we’ll call it ASP.NET 2.0 for the purposes of this conversation). They have a hard time understanding why I frequently suggest writing controls from scratch, or favoring string outputs instead of nested control hierarchies. Page.Load() is their friend, even when it leads to complete spaghetti.

In the general spirit of the Pragmatic Programmers, I view stock controls and extended libraries as “Evil Wizards” whom should be trusted only as much as a senior Sith Lord. Learn what you can, then supplant them with something younger and more nimble. Microsoft themselves encourage simple custom logic for ASP.NET output, especially to improve scalability and performance.

As for lines of code, I could care less about code brevity, as long as two concepts are integrated into that code:

Write code that does ONLY what you need it to do (YAGNI)
Write the same logic once, and ONLY once (DRY)

Naturally following the above tenets will lead to a small code base, but shortening code lines alone misses the spirit of the issue.

I should note that my job involves creating “glue code” almost exclusively; standing on the shoulders of someone else’s product, not maintaining a revenue-generating legacy.

Your results may vary.

rien · December 25, 2007, 12:00am

@tony morris

after re-reading the paper you re-pointed me at: you are right, my statement is misleading. (i must add that i do not agree with everything in this paper)

my programming experience is wide but i mostly used strongly-statically-explicitely typed languages, which i am a big fan of, and those considered extremely verbose: Pascal, Modula-2, Ada (no, not COBOL…).

PatrickS · December 25, 2007, 12:00am

I have 2 comments:

First every developper always claims that they are maintaining much more lines of code than they actually do. Recently I was consulting on a large project, they said 4M LOC, but it was actually 700K LOC (measured with NDepend). The difference comes from the way you’re measuring LOC and I detailled here how you should do in the .NET world:
http://codebetter.com/blogs/patricksmacchia/archive/2007/10/03/how-do-you-count-your-number-of-lines-of-code-loc.aspx

Second, the phenomenon of large code base hard to maintain is better known as: Diseconomy of Scale. This is a phenomenon that explains why it can take a year to add some few LOC on a large project such as Vista (70M LOC). The maintenance curve is simply not linear from code base size, it tends to be square root or even logarithmic.

sapphirecat · December 25, 2007, 12:00am

(I hate how popular this blog is. I never get into the comments before they’ve degenerated into useless noise.)

All dynamic languages mean is that you can write a horrible mess in far fewer lines. I had a 16 KSLOC (24K actual file lines) project get unmanageably huge in PHP. Somewhere along the line, the cost of new features jumped from a week or so up to several months. We killed the project before we found out how long, exactly.

Even 1000 lines of Perl can be pretty big for a single person to manage.

It seems to me that programming is a fine balance. Trying to over-compact code is as bad as bloating it needlessly. It’s not easy to handle a function with 10 parameters. Keyword arguments help, but only in Python where they’re done right. Passing arrays/hashes as in PHP or Ruby is verbose and opaque, and Common Lisp’s interaction between key and optional is subtle and evil.

In the end, writing less code means writing fewer bugs, but it also raises the complexity:LOC ratio. Less code in and of itself is no silver bullet.

Doug · December 25, 2007, 12:00am

LOC is not directly related to complexity, but it is an indication. Even blank lines, comments and lines with only braces can be counted. It takes that same time to scroll through a blank line as through a statement line.

In my experience only about 10% of a program code is the core. The rest of the program is the gunk that glues it all together and does input/output. Of the 10% core 90% is dealing with boundary conditions and edge cases leaving 1% that is the heart of the program.

Computer Science textbooks show the tiny 1% part that can be expressed cleanly. When you try to write a real program you find that 99% of it is gunk they didn’t warn you about in the textbooks.

PatrickS · December 25, 2007, 12:00am

LOC is not directly related to complexity, but it is an indication.

The metric Cyclomatic Complexity is a good indicator of complexity:
http://www.ndepend.com/Metrics.aspx#CC

It takes that same time to scroll through a blank line as through a statement line.

I disagree. We are not talking about scrolling but about understanding and maintaining code source.

In my experience only about 10% of a program code is the core. The rest of the program is the gunk that glues it all together and does input/output.

I disagree. Every single line of code can provoke a bug or a performance hit. Input/output is often what makes the difference between a successful piece of software, easy to tackle with, that satisfies users (and consequently that sells well), and an astute piece of software, that satisfies the developer ego, but that is unusable and that cannot be sale.

and · December 25, 2007, 12:00am

It takes that same time to scroll through a blank line as through a statement line.

Man, need think

John · December 25, 2007, 12:00am

Languages that use dynamic typing do result in fewer lines, but end up taking vastly more memory and time to execute. No dynamic language can be converted to machine instructions(as in C/C++), a cross-platform assembly(.NET), or even byte-code(Java). This isn’t a shortcoming of the languages or compilers; its the nature of computers (more specifically, processors).

T_E_D144 · December 26, 2007, 12:00am

If you personally rewrite 500,000 lines of static language code into
190,000 lines of dynamic language code, you are still pretty
screwed. And you’ll be out a year of your life, too.

Probably more than a year. 500KSLOC is a ton of code. There’s no way a single person is converting that amount of code to another language in that little time without tool help. Using a converter program might make it possible, but you’d probably end up with more code than you stared with using a converter. It would be machine-generated code too. Ick.

If he performs this rewrite in a year of hobby time with no major loss of functionality I’ll be really impressed. Like starting a new religon level impressed.

T_E_D145 · December 26, 2007, 12:00am

The metric Cyclomatic Complexity is a good indicator of complexity:
http://www.ndepend.com/Metrics.aspx#CC

Well…perhaps for an individual routine. That’s all it is really desgned for. For an entire 500KSLOC program you aren’t going to get a number out of it that means anything.

There are some common constructs, for instance command processors (large case statements) that totally hose most cyclomatic complexity calculators too.

Cyclomatic Complexity can be useful to point out which functions or source files might need extra attention, but it isn’t a very useful tool for looking at entire projects at a macro level.

OneMist8k · December 26, 2007, 12:00am

+1 to rubix for library support.

+1 to sapphirecat who said: “All dynamic languages mean is that you can write a horrible mess in far fewer lines.” Yes. I worked on some 4GL tools that could make a real steaming pile in very few LOC.

Ah, the real reason for the re-write. Wesley Shepard said, “Worse, the code is a mishmash of ASP/VBScript (ugh) and ASPX/C# (for those parts I have been able to update).”

I look at code I put down a year ago and always think it sucks. I can’t imagine what Wesley sees when he looks at his 10 year old code… (shudder)

Duncan · December 26, 2007, 12:00am

I applaud you for admitting that you use a lot of Yegge’s stuff - his posts are indeed rich food for all our minds. Raganwald shamelessly rips him off, and through his awkward rephrasing looses much of the meaning that comes from actually reading the whole Yegge post. Your condensing and commentary are a much better approach.