Of Spaces, Underscores and Dashes

I try to avoid using spaces in filenames and URLs. They're great for human readability, but they're remarkably inconvenient in computer resource locators:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/04/of-spaces-underscores-and-dashes.html

This_is_a_single_word
but this-is-multiple-words
ā€“ Jeff Atwood

Although the second statement is true for almost all Windows applications; the first statement is NOT always true, especially for Microsoft Office products (Word, Outlook) even Windows text box dialogs.

You can test this by using the above examples in a Word document, Outlook email, text box, etc.; searching for one individual token of the entire expression, e.g. ā€œwordā€. In these applications, the search will find ā€œwordā€ in both of the hyphenated AND underscored versions of the expression.

Also, you can Ctrl-Left-Arrow/Ctrl-Right-Arrow to the first last character of each token in both versions. Whereas applications that treat the underscored version as a single ā€œwordā€, Ctrl-Left-Arrow/Ctrl-Right-Arrow will advance only to the first last character of the entire expression.

By my reading, this interpretation (dash as a word break character but not underscore) is continued in the Unicode world. It defines the rules for word boundaries here:
http://www.unicode.org/reports/tr29/tr29-9.html#Word_Boundaries

(Just for reference, in unicode terminology, the underscore is called a ā€œLOW LINEā€, and the dash is called a hyphen. There are several hyphen characters, see the link above.)

ithe first statement is NOT always true, especially for Microsoft Office products (Word, Outlook) even Windows text box dialogs./i
As a developer, this always drives me nuts. I expect a text box to treat underscores and dashes like my IDEs treat them: an underscore as a letter, and a dash as ā€œpunctuationā€. In programming of course, this makes sense. An underscore is not an ā€œoperatorā€ in the context of many (most?) common programming languages, unlike the dash (minus), or whitespace (symbol separator).

However, an underscore really has no official place in the English language. It was created for the typewriter, as a way of underlining. Itā€™s word-separation usage only came with the advent of 2GLs. So confusion is understandable, if not really acceptable, when dealing with word-processing functionality.

At any rate, it would be nice if everyone treated it the same way. At least within a given context.

Underscores are also ugly and hard to type quickly, whereas dashes are easy and the same width as a space, or only slightly longer. Underscores at 2-3 spaces wide make words look disconnected, even if theyā€™re otherwise less obtrusive.

Of course you could simply use + and ban the use of it in filenames, with appropriate server support. (Or be prepared to use %2B.)

Iā€™m-more-concerned-about-which-one-is-more-readable.

Iā€™m_more_concerned_about_which_one_is_more_readable.

I tend to prefer the underscore because itā€™s below the letters, as if in an underline. But I donā€™t think thereā€™s a right answer. Just preference.

Another difference between underscores and dashes: very often, you canā€™t see underscores when the words are underlined (as in most hyperlinks). Sometimes this is an advantage, but in most cases it isnā€™t.

2 little notes from a unixish perspective (I thought Iā€™d share because Iā€™m reading your blog to get the windowsish perspective):

\w is not portable regex, but rather from the preg (perl regex) set.

Also, spaces in file names, esp. as arguments to shell scripts, are easily lost because once inside the script, evalling a string more than once makes it ā€œfall apartā€, quoted or not.

The solution there is the magic variable ā€œ$@ā€ (quotes included) which evals to all positional arguments, properly quoted one by one.

I like this one better:

Iā€™MMOR~1

I remember a quote somewhere, something like :

8 chars ought to be enough for anybody. :slight_smile:

nah, sorry to low ā€¦ :slight_smile:

We switched from _ to - a few years ago for usability reasons: some folks donā€™t see an _ in a URL and assume itā€™s a space.

Hereā€™s another vote for ā€˜.ā€™ as a word separator in filenames (does UNICODE call that a FULL STOP? - actually, it does).

I have no in-depth analysis for why, itā€™s just what I have always used to avoid the evils of spaces in filenames.

this-is-a-long-filename
this.is.a.long.filename
this_is_a_long_filename

I still like it better.

This was really useful. Iā€™m working a file to convert search friendly urls into a keyword search on my website. I was going to use underscores but now Iā€™m going to use dashes or underscores. Thanks.

I, too, hate the spaces! You donā€™t know how badly Iā€™ve wanted to change ā€œProgram Filesā€ to ProgramFiles or Program_Files.

That being said, what about camel/pascal casing? Seems like that would be another option for this discussion. I wonder how the readability changes?

The space/dash/underscore self-deliberation comes up frequently as Iā€™m ripping CDā€™s to mp3s and trying to get a good filename. Iā€™ve resolved to using _ for spaces and ā€“ for a delimiter.

I guess I would argue that the _ makes more sense as a space, but maybe thatā€™s because thatā€™s what Iā€™ve always used.

I agree that itā€™s nice to adhere to a standard, but if there are good reasons to throw the standard out the window, why not consider it? Not that Iā€™ve come up with good reasonsā€¦

renaming ā€œProgram Filesā€ can break some archaic installers / updates so I wouldnā€™t risk it

I personally simply add a Junction (Symbolic Link) that maps Program Files to Program_Files (so they point at the same place on a NTFS disc) and I can then refer to it in either way. Very handy :slight_smile:

Interestingly iā€™ve heard Vista will be dropping all the two word directoring and moving from Program Files - Programs, My Documents - Documents etc.

p.s. util to help create junctions: http://www.sysinternals.com/Utilities/Junction.html

Jeff, thatā€™s terrible advice. If programmers donā€™t use the same characters in their filenames that their users use then how is software ever going to work properly?

W3Cā€™s CSS validator is broken because somebody didnā€™t think to check it with a URL that has a %20 in it: http://www.kirit.com/W3C%27s%20CSS%20validation%20service

Microsoftā€™s Response.Redirect() is broken because it canā€™t work out which bit of a URL is which, also due to sloppy encoding practiced by most web developers: http://www.kirit.com/Response.Redirect%20and%20encoded%20URIs

And again a problem with encoding means that 404 handlers on IIS are broken too: http://www.kirit.com/Errors%20in%20IIS%27s%20custom%20404%20error%20handling

And because youā€™ve reverse engineered Googleā€™s software doesnā€™t seem a good reason either. Sooner or later theyā€™ll change their implementation and then where will you be? If Google considers a hyphen as a space then Google is broken. Sooner or later theyā€™ll fix that (one hopes).

One comment suggests ā€˜+ā€™, like Technorati uses in tags? Thatā€™s even more broken ā€“ the ā€˜+ā€™ is used as a space substitute in query strings but NOT in file specifications, a bug in nearly every URL parser Iā€™ve found.

ā€œIf you use an underscore ā€˜_ā€™ character, then Google will combine the two words on either side into one word.ā€

Personally, I feel this is Googleā€™s problem, not mine. But thatā€™s just my opinion. I will only go so far to accomodate Google. (And it shows; their cache of my site is a mess, because my CMS throws insane session cookies at Googlebot, in the URLā€™s querystring. I havenā€™t gotten around to fixing it yetā€¦ see previous comment.)

Hereā€™s another vote for ā€˜.ā€™ as a word separator in filenames (does UNICODE call that a FULL STOP? - actually, it does).

One problem with period is that it makes it difficult to figure out where the file extension begins, eg:

this.is.a.html.file.html

Spaces in URLs are bad because they have to be replaced with that ugly unreadable %20 notation. But what exactly is wrong with spaces in file names? As long as you donā€™t need to share a file with backwards Unix systems that donā€™t understand file names with spaces I donā€™t see the problem. Where do you ever enter a file name?

  1. In a file selection dialog. Standard Windows file dialogs donā€™t care about spaces. No quotes are necessary.

  2. On the command line. Auto-completion automatically puts quotes around your file name as necessary.

  3. In a programā€™s source code. Strings must be surrounded by quotes anyway, ergo spaces are not a problem.

  4. In some text storage facility, such as the registry or an XML file or whatever. Any well-designed storage format respects embedded spaces, so once again no problem.

The only problematic situation I can come up with (other than sharing with Unix systems) are batch files. Thatā€™s the only time you have to consciously remember to use quotes for file names with spaces. And even thatā€™s no longer true when you upgrade to Windows PowerShell!

Cā€™mon guys! Either computers work for you or you work for them.

Stick with spaces.

There_is_a_reason_that_we_donā€™t_write_our_sentences_like_this. There-is-also-a-reason-that-we-donā€™t-write-our-sentences-like-this. And.this.is.a.joke!

Iā€™ll let you hardcore filename worshipers use dashes, dots, and underscores. Iā€™ll use spaces.

You forgot the Wiki style: no spaces!

ThereIsAReasonThatWeDonā€™tWriteOurSentencesLikeThis

Which I suppose is what other posters were referring to with ā€œCamel Case styleā€, but I think of this as Wiki-styleā€¦