Code Smaller

Unless you've been living under a rock for the last few years, you've probably heard about the game Katamari Damacy. The gameplay consists of little more than rolling stuff up into an ever-increasing ball of stuff. That's literally all you do. You start by rolling up small things like matchsticks, thimbles, pushpins, and so on. As the ball gets larger, you roll up ever larger and larger items. Eventually, your Katamari ball gets so large you end up rolling together cities, mountains, clouds-- eventually entire planets. It's unbelievably fun, and completely mesmerizing.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/02/code-smaller.html

I highly recommend Katamari Damacy – even if you just rent it for a weekend. One of the strangest/funnest games I’ve ever played.

And surprisingly on topic because it’s feb14, a Katamari valentine card from Beavotron: http://frank05.critter.net/valentine_katamari.png

Yeah, a bunch of my friends love that game.

But I think the goal is to keep code manageable. Really good code generation is key for things like database access and objects I believe.

Rules of unnecessary optimization:

  1. Don’t.
  2. If you’re tempted to violate rule 1, at least wait until the program is finished.
  3. Nontrivial programs are never finished.

That being said, keeping your code in a state where it is manageable is not unnecessary optimization.

Hmm. Smaller is better? I think what we’re really looking for is “modularized properly” is better. I think a lot of projects are dived into too willy-nilly (especially by neophyte/novice/hobbyist programmers) with no pre-planning of what the thing should do and how to logically code it. You should, if you’re any good at you code, be able to sort tasks into obviously autonomous units, knowing how they’ll interplay and how they’ll solo. I find a lot of the code I read from others can only be described as a hunt 'n patch job. Inelegant, unreadable and difficult to support…

“modularized properly” is better

And what is modularization other than dividing something up into smaller, independent subunits?

Unfortunately, this is contrary to what is being taught in some high schools. I know that in at least one AP Computer Science textbook, they teach that everything should be in one class, with little concern for later changes and optimization. The root of this problem starts at a young age; proper techniques should be shown in schools.

Jeff, you asked what modularization is, but I think you’re, intentionally or not, hinting at the real problem. Many programmers perform a modular breakdown of the top problem and stop there, writing each module as a monolithic “chunk”. It’s hard to learn to perform modularization on the modules, and to do that recursively until the modules are small enough to verify by inspection.

(Not that you should skip unit tests, but it’s nice to be sure the tests will pass from reading the code, too.)

Tom Grochowicz, that’s because in school, most every assignment is a standalone project. No need to break it up. I see this alot in the programming assignmnets we’re given: rarely do they overlap. I do try to modularize as much as reasonable in the semester projects we get.

(By the way, I’m in college).

How does one judge when a method is too long or too complex? Cyclomatic Complexity is a metric that can be used to help determine when a method should be refactored into smaller units.

This isn’t really a new observation, though the association with Katamari Damacy is clever – LOVE THAT GAME.

For an older reference to the same issue, see this paper from 1997:

<a href="http://www.laputan.org/mud/">http://www.laputan.org/mud/</a>

This paper examines this most frequently deployed
of software architectures: the BIG BALL OF MUD.

This is the strength of functional languages in general, and the more obscure Haskell in particular. Most of these languages encourage it, but others make it almost mandatory. Tragically, most functional languages are developed in academia with little regard for the pragmatic things that the language must do to really meet the needs of a wider audience: database access, GUI (and not ugly Tcl/Tk ones, either), and so on.

But, I’m going to argue, just for the sake of argument, that having TOO MANY methods leads to serious overhead in calls and stack usage. Don’t argue with me about the speed of CPUs and how much memory there is available these days, because it ain’t always so. In some cases our code runs on years-old customer Solaris boxes which are held to a fiduciarily-responsible level of cost. Call overhead for 100,000 sets of end-user transactions runs to more than you think, and sooner or later the customer starts complaining about the time taken per transaction. I can’t very well go back to them and say “Yes, the transactions do take a long time, but the code itself is VERY maintainable!”

There’s also a problem of fractal geometry here, in which each grape in the cluster is composed of more clusters of grapes, and let’s face it, 10 lines of method calls which themselves devolve into 10 method calls, isn’t much of an improvement, because you start to lose the context in which all the calls are made. (Yes, I know about minimizing dependencies, but believe me, it isn’t always as neat and clean as just saying it, especially when you’re dealing with complex host systems.)

I’m baffled that so many experienced software designers are ready to stand up and defend “coding bigger”. The performance benefit of fewer method calls is a thing of the past. Seriously. What’s more expensive – more hardware, or higher development costs and opportunity cost? And if you’re not ready to concede that, then you should be asking yourself what you’re doing programming in a 3GL. You know you can’t get good performance without diving down to assembly language.

The fact that you can’t grasp the big picture when looking at the low level is exactly the point. That’s what makes that low-level code maintainable and reusable. You don’t need to know the big picture to be productive.

I second the earlier post on Big Ball of Mud. That paper is awesome, and is a good explanation of coding bigger.

I really love this blog :slight_smile: thank you so much for sharing this great (small) view on software development. It’s very to the point and has a low tldr factor :slight_smile: heh you made my day again…

Katamari Damacy, huh? Wake me up when it comes out on the PC

Amen Jeff - lots of clearly purposed little pieces always gives a better design, but somehow this important mesage is not widely known.
This was brought forcefully home to me a few months ago, when I suggested to a more junior colleague he refactor some repeated code - only a few lines but forming a clearly identified atomic unit - into a method (or a class, I can’t recall which now) - his reaction was astonishment - “surely it’s not worth it”.
There are a number of factors at play here I think:

  1. Lack of education, pure and simple.
  2. Premature optimisation (crazy but true).
  3. Languages that make it a rigmarole to define ADTs - eg in Java we typically need a new file, cannot easily wrap primitives etc. Paradoxically we got much better type abstractions in C via the simple typedef!
  4. Local coding standards that require (for well-intentioned reasons) banner headers for every type and method introduced.
  5. Design documents that list the big classes and methods, but do not make it clear to developers that they can - and should - decompose further during implementation.

All of these conspire to make it a “big deal” in the mind of the programmer to break out of the flow and create a separate abstraction (method or type).

Jeff, you missed one aspect of keeping dependencies low:

Simpler, more independent code is easier to share.

Writing shareable, reusable code, IMO, separates the men from the boys.

The relationship between lines of code and bugs is only linear? That’s surprising; I would have thought that as a project gets larger it picks up more confusing interactions, and the frequency of bugs would go up. Higher frequency x more lines would yield a nonlinear relationship.

It never ceases to amaze me when I go back and work on my old code (either to bug-fix or add features) that I often end up with LESS code that does more faster, more securely, and handles errors better. It is pretty hard to always get this right the first time without falling into the premature optimization trap.