Text Columns: How Long is Too Long?

Ian Griffiths recently wrote a proof of concept WPF browser for the MSDN online help. One of the improvements cited is multi-column text:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/06/text-columns-how-long-is-too-long.html

So at three columns, justified, the fast readers were slower than the slow readers?!

I’m a fast reader and shorter lines of text seem faster to me since my eyes can chain easily to the next first line of text, where with longer lines I tend to hunt for the next line when I’m done with the current one.

However, I think this has doubtful applicability to code. IMO, each line should be a new thought and code should normally be browsed with word wrap off and lines extending to infinity. Once a thought needs to be explored, word wrap should be turned on, but otherwise not.

I do think it’s important that the line express its major intent in the first 80 characters or so, though. Anything beyond that should be either further parameters that are of little importance or a continuation of the thought expressed in the beginning of the line.

I think doing it like that makes it easier for the brain to scan (thought 1, thought 2, thought 3, not thought 1, thought … er … 1, thought 2).

Reading speed would increase with longer lines, but not comprehension. That’s what the research seemed to say, anyway.

Actually, they didn’t find any statistically significant relationship between # of columns and comprehension:

A 2x3 randomized block ANOVA found no significant main effects or interaction for either justification or number of columns for total comprehension.

Nor was there any statistically significant result for “satisfaction”, eg, the reader’s opinion of how easy the page was to read. The only statistically significant results were for reading efficiency, which they define as reading speed divided by comprehension scores:

an interaction approaching significance for justification x number of columns was found

Whether the columns are justified are not turns out to be hugely important!

I do think it’s important that the line express its major intent in the first 80 characters or so

Definitely.

The comparison is a little shaky, because you NEVER see code formatted in multiple columns. What we’re really comparing is a single column with long lines, to a single column with short lines. And this study doesn’t cover that…

It does, however, provide some decent evidence that arbitrarily short lines aren’t always superior to long lines.

[Note: I modified my conclusion in the post to clarify this point]

I think we should demand full justification for the VS.NET code editor.

We have standardized on 80 columns because we spend a lot of time reviewing code revision changes side by side using a difference viewer. Viewing differences side by side becomes much easier when the code is written in a narrow column.

I think we should demand full justification for the VS.NET code editor.

I’m trying really hard to imagine what that would look like… :wink:

Viewing differences side by side becomes much easier when the code is written in a narrow column.

This is a great point. It’s one of my pet peeves, too. Code that spreads out too much horizontally is smelly:

http://blog.codinghorror.com/flattening-arrow-code/

The more I think about this, the more I think that code is such a specialized form of writing that none of the typical reading rules apply.

Keeping your lines short is still an important guideline. But arbitrarily refusing to allow any code to extend past column 80 is unnecessarily inflexible.

[updated post to reflect newly discovered SURL study on line legth]

Keeping one’s code within a 78 column limit does have the advantage of being rather a lot kinder to those on 80 column terminals when emailing it about than having lines of code either run off the edge of the scree or line break in an ugly manner. Though I suppose this is of only trivial importance unless you’re attempting to work with rather oldschool programmers.

I like your blog because it uses the full width of the browser for the text.

Too bad that bloglines makes you scroll sidewise because other elements of the feed such as pictures increase the virtual width and render some words outside of the view pane.

Even I can’t believe this high percentage, but I just checked how my main project written in Ruby gets by in line size:

d = '/home/dewd/workspace/pilar'

sizes = Hash.new(0)

Dir["#{d}/**/*.rb"].each{|f|
  ts = IO.readlines(f).delete_if{|line| line.strip.size == 0 }.map{|line| line.size }
  ts.each{|sz| sizes[sz] += 1 }
}

total = 0
less_than_seventy_chars = 0
sizes.each{|size, n_occurrences|
  total += n_occurrences
  less_than_seventy_chars += n_occurrences if size  70
}

perc = (less_than_seventy_chars * 100) / total.to_f

puts '====== Result:'
p 'Percentage of lines with less than seventy chars: %f' % perc
p 'Total of lines: %d' % total
p 'Number of lines with less than seventy chars: %d' % less_than_seventy_chars

====== Result:
"Percentage of lines with less than seventy chars: 95.753574"
"Total of lines: 33016"
“Number of lines with less than seventy chars: 31614”

I’m surprised that you’re even comparing text for reading with text for code. I’ve never seen anyone concern themselves about how fast you can read a line of code. Speed is nothing. Comprehension is everything. And comprehension is much more a matter of what’s going on in the line, not how long it is. I would posit that long line of code almost always are less comprehensible than shorter ones because there’s more than one thing going on in the line – nested function calls, ANDed conditions, whatever. Not always, but mostly.

I must say I’m surprised that justified text tests better than ragged text, mostly because the automatic justification algorithms in browsers and in word-processing software can results in some pretty horrible-looking text. (One way typesetters get justification is to split a lot of words so that there are not yawning fields of whitespace in lines, but no automatic algorithm I’m familiar with tackles the tricky job of splitting words.) I notice this particularly, of course, when reading some sort of technical material that’s full of method names or URLs or whatever – ie, words that are long and un-splittable. Those totally hork up justification.

Incidentally, I must respectfully disagree with piyo – I really dislike text that goes all the way out to the margins. But, you know, different gaps for different chaps and all that.

My preference isn’t for a given line length but for the rule “Try very hard not to allow a line of code to extend past the right edge of the editor viewport, avoiding horizontal scrolling.”

I find that comprehension drops drastically when I can’t see the entire statement at once. Plus, more than once I’ve been bitten in languages that don’t require a statement terminator (VB, etc.) by something off the right edge that I didn’t realize was there.

I’ll manually wrap statements as they approach the right edge. Granted, the number of columns in the viewport will vary based on screen resolution, etc. Other than the “kids” whose eyes can handle 1600x1200 on a 15-inch LCD, most of the people I work have about the same number colums to work with.

i personally feel that the longer the line is the harder it is to read and comprehend but that may just be beacuse shorter lines are generaly less important than longer lines. As far as in my own code i like having lines of different length randomly in the code because it allows me to scan the file more quickly looking for a set of lines that resemble the block of code i was working on. The mind can match shapes faster than text.

Long lines are definitely a problem when you have two files open side by side. Anyway I think code might have more in common with musical notation than the English language.

Some programming languages preach indentation with spaces rather than tabs. E.g., Ruby (2 spaces) and Python (4 spaces).

One way typesetters get justification is to split a lot of words so that there are not yawning fields of whitespace in lines, but no automatic algorithm I’m familiar with tackles the tricky job of splitting words.

Knuth put a lot of work into developing one for English for TeX. Properly rendered fully justified text (as TeX has always done it, and as other document processors are only now starting to do), gives the advantages of both left-justified text and conventional fully justified text.

There’s a pretty big and basic difference between the length of code lines and those of regular text: long code lines are usually long because they start at a deep indentation level, meaning they have a large amount of whitespace on the left. For the purpose of reading comprehesion this whitespace doesn’t count. So if the indentation level is six tabs at four spaces, your 100 character line is actually shorter than 80 characters.

Another point is recent popularity of very long names for individual code units, as in Java and .NET. A 132 character line in C++ might be incomprehensible because it contains a dozen different statements and operations, but a 132 character line in Java might well contain just a single method call with elaborate namespace and class names. Should we arbitrarily break up single statements because they’re long, even though they’re perfectly readable?

For these reasons, absolute character limits on code lines without regarding the concrete circumstances are pointless.

I suppose it all depends on how trusting you are. If you can be sure that the code to the right of col 79 doesn’t contain surprises or bugs, then fine - just scroll down and watch the structure unfold. (Yeah, right!)

Having to scroll left-right is a big no-no. Painful is not a strong enough word. If it’s such a good thing to do, why aren’t Home and End called PageLeft and PageRight, eh???, eh???!

Seriously though, what we really need is syntax-aware source control. Then you could share code with people whose views were very different to your own. Modern IDEs and refactoring tools are starting to take us in that direction. (Hands up if you use Ctrl-E,D in VS2005)

Or maybe we could just agree to check in in a canonical format. 80 columns, anyone?

When convenient, I format the code in parallel columns anyway, for example:
(short statement) (or throw exception),
(short statement) (short comment)
both of which ensure that other details do not obscure the normal flow of control.

What I noticed while reading this article is the huge impact of ClearType on readability.
The screenshots of the four examples are not rendered using ClearType, where the rest of the page is on my computer. To me it seems that the impact of ClearType is much bigger than the impact of the column layout.