The Case For Case Insensitivity

LOL… no, but you are just as likely to type in “HmtlTag”.

Yes, I often type “stupid-selfish-bitch” instead of “mother-in-law”. It’s a common mistake. The letters are right next to each other on the keyboard.

re: case sensitivity - I think initially it was performance. It took more work to determine if two strings are equal in a case-insensitive manner. As far as why case-sensitivity has stuck around. The culprit I’ve seen pointed to most often has been Unicode. Apparently, it’s hard to compare case between two different languages. Since it would be more work to retrofit all of the compilers, not to mention how much of the worlds code would break if case-sensitivity was abolished, it’s left in.

Personally, I think it’s pretty important. Would you rather have “aids” or “AIDS”? But I understand that compilers work differently.

I’ve tried really hard to get upset about this casing thing - and I just can’t do it. It’s just sooooo petty.

Previous comments about this being a ‘scripting’ issue instead of a ‘casing’ issue are right on the target. Its one of many tradeoffs that need to be considered when choosing a language.

Even if you do choose to use a scripted language, it’s not that difficult for an appropriate editor to catch the casing error at edit time. it’s even easier to catch the error with a quick static analysis of the code. If it bothers you that much - you could consider writing your own tool to catch the errors. This wouldn’t be that difficult for PHP or python.

The last time I checked, the C# keywords weren’t defined in Spanish or German. They’re all English.

Furthermore, I’ve noticed that many foreign language speaking coders tend to stick to English throughout their code, even in the comments and variables. I guess it’s another example of mixture avoidance-- just like avoiding mixtures of case-sensitivity and case-insensitivity. If you’re going to learn the keywords in English, why not go all the way?

When they aren’t typing mistakes. The words used in this sentence have the same meaning whether or not they are capitalized or not. Is “The” != “the”? Is “When” != “when”?

But in a larger sense, it’s about empirical productivity data: how much time have you spent sussing out problems due to needless case sensitivity? Maybe you’re the rare genius coder who never mistakes “SignOn” for “Signon” in a string somewhere.

Meanwhile, I’m still waiting for someone, ANYONE to provide a single link that provides any kind of evidence that case sensitivity is more productive. All I can find is “I just spent the last hour…” horror stories about case sensitivity.

Which of the following should the scripting engine forgive? At what point do you smack someone upside of the head and tell them to just be more careful? In other words, just how bad can someone mess something up and it still be the scripting engines fault for not being tolerant enough?

The correct spelling is HtmlTag:

  1. htmlTag
  2. HmtlTag
  3. HTmLTag
  4. Cucumber
  5. htmltag
  6. html
  7. HTMLTAg
  8. CucumberTag
  9. taghtml
  10. hTmLtAg

You get the point… I’m personally not very tolerant of these types of mistakes. I believe that things like “VB’s option strict” and browsers that don’t require well formed HTML are the bane of software development. Just how much time could we have saved over the years if Internet Explorer would simply have shown an error message instead of rendering crappy HTML markup that didn’t even stand a chance of working!!! Make it exact and you don’t have any excuses. Try to guess or raise your tolerance level and programmers get sloppy and bugs get harder to find.

As are ALL of the programming languages it seems. Even Ruby which was created in Japan and has a huge following over there. Weird, huh? There’s a difference between keywords and variable names though. The source files are created using the default ANSI codepage, but they can be saved as Unicode (I believe UTF-16) files. So that means you can use identifiers in any Unicode based language including, I’m assuming, multibyte langauges like Japanese, Chinese, and Arabic. EXCEPT in C++, C++ still uses ANSI identifiers.

Argh, I should clarify the above statement. In Visual Studio, specifically VB.NET, C#, JScript.NET, and Java all use Unicode for their source files. I don’t know about other IDEs/Languages. I think Xcode does Unicode source files as well.

Oh, on topic.

I don’t think it’s productive, I think it’s a necessary evil right now. My earlier statement was an attempt at a joke. I HATe case sensitivity in an IDE, but I HaTe it even more if the IDE doesn’t help me be consistant by offering to correct my case as VB did.

I try to enforce a casing standard when I write sprocs, T-SQL keywords are uppercased, everthing else is lowercased or matches the entity/domain casing. There’s no technical reason for me to do that, but I find it does make me more productive when I have to read over the statement later on. YMMV.

Go look at SAP. Read some of their ABAP code. I hope you know German really, really well. There’s millions of lines of code, all written with really nice German documentation with lots of German variable names with nice English keywords.

I think this is much more of a scripting than casing issue. There are MANY pitfalls of run-time versus compile-time checking, not just case-related typos. You have to take the bad with the good if you are going to use a dynamic language.

But why not learn the tool and use it effectively? Perl has “use strict”, PHP has “error_reporting(E_ALL | E_STRICT)” (PHP 5 at least) and as mentioned previously, VB has “option explicit” (I don’t know of an equivalent for Python).

I think the “hours” of debugging people mentioned could have been cut short by just using these methods.

The point is that in that language it’s not wrong. You personally might not like it, but it is allowed.

Me personally doesn’t like it, and one cool thing about some modern IDEs (not languages, IDEs) is that they highlight inconsistent casing and autocomplete with correct case.

I’d tend to support a coding standard that requires consistent case, because it’s just one more thing that makes reading code easier. Mentally translating HTmlTags to htmlTags every time I see it slows me down and makes me less likely to notice the important changes that are actual bugs, like the use of hmtlTags instead of HTmLtaGS, htMlTags, HTMLTags or whatever other random casing the (ab)user typed.

Moz

So why aren’t programming languages localized, if everything else is (the os, the applications, the web)? What’s so special about a programming language that makes it magically exempt from localization entirely?

Those are specious comparisons; neither of those things are comparable to “HtmlTag” vs “HTMLTag”… unless you consider BODY malformed HTML compared to body

The bottom line is productivity. Nobody (so far) can provide any evidence whatsoever that case-sensitivity makes programming any easier or faster, but I can find lots of evidence that it makes programming slower and more painful.

Getting the letters correct in a variable name is hard enough. Case is only a significant difference if you’re a computer, not a human being. The meaning of the word “Cucumber” doesn’t change if I type it “cucumber”. Forcing the developer to use the exact same case or smashing their face in with a compile error is just plain cruel and unusual punishment-- death by Von Neumann.

Plus, case insensitivity has another benefit: it prevents developers from doing things like calling variables in the same function I and i, or discriminating between public and private variables purely by case.

So, do you have any proof that case-insensitivity makes programming easier or faster, other than someone’s (including yours) opinion?

Look, English as a language sucks. There’s so much ambiguity in English as a spoken language that it was inevitable that some of it would spill through into programming

Even the fact that English has case makes it more complicated than most languages need to be. Perhaps we should all code in Korean, since it has no notion of case, no homonyms, no silent letters, no crazy “i before e, expect after c, except for science” rules, is totally phonetic and has very strict grammar rules that makes it murderously even for the most inept person to learn how to read and write it.

I agree with David’s comments. Let the compiler help you out by setting strict options (if the compiler has them).

Plus, these days the new IDEs include features which highlight possible problems (like casing/variables not declared, etc) even before you compile.

The casing “scandal” is certainly not something that’ll catch you out after you compile and are running. That’s a problem for scripting languages or VB. So what’s the big deal?

I guess this whole baggage that such languages as C/C++ bring with their case-sensitivess is why the .NET CLR is case-sensitive. For languages like VB.NET it’s the compiler which has to be made to jump through hoops for the language to remain case-insensitive while the CLR is sensitive about the whole case issue. Then again, VB.NET is no fun at the best of times… http://www.cafepress.com/radioactivecode

Well… I give up. I still say that this is entirely a scripting language issue. No one has been able to show that there is any productivity increase or decrease with either of the two methods.

Given that this issue doesn’t impact me in the least, I will just chalk it up to yet another reason not to use scripting languages. Just make sure you remember it the next time someone is trying to tell you that scripting languages make programmers “much more productive” than compiled languages. Yeah… uh huh.

Now this is funny, I blogged about how just a worthless debate this about a month ago when I had to endure yet another “It’s great/It sucks” debate by my junior programmers. So much passion, wasted.

http://www.enginefour.com/blogs/shawn/2005/10/worthless-programmer-debate-1-case.html

I suspect that all the people complaining here about how ugly case-insensitive code looks, or how good coding standards should require consistent variable naming, have never actually tried using a case-insensitive language with a good IDE that standardises the casing automatically.

Case insensitivity does NOT need to mean that variables are shown as you type them, as Gary’s example clearly shows.

So you could write HTMLTag at the top of your sourcefile, HtmlTag in the middle, and at the end you are too tired of typing to hit the shifts so you are conveniently allowed to type htmltag.
Would you do this?

Absolutely. That’s what I do all the time. In VB.NET I only type HTMLTag once, when declaring the variable, and thereafter I type htmltag and let the IDE change it to HTMLTag automatically.

I totally agree with Jeff: case sensitivity has no upside, and is just a waste of my time.

“And nobody seems to dispute the idea that having the compiler auto-declare variables for you is a bad one.”

Says who? I’ve seen idiots out there who dispute this as well.

"I’m still waiting… but in all seriousness, this is my point. "

So you don’t accept my argument/example which states that you can’t have true variable naming conventions without also requiring specific casing? For example, what good is it to say that variable names must start with the type if people can all use the following to indicate the same variable name?

InTeGeRDAYSoLd
integerDaysOld
INTEGERDAYSOLD
IntegerDaysOld

You’ve got the most important case against case insensitivity there is… consistency. What more do you need?

And I’d be interrested in seeing the empirical evidence that shows the true “cost” of case sensitivity vs. case insensitivity. I would be willing to bet that over the past 30 years, the benefit of reduced compile times (conisder ALL of the compiles that have been done by EVERY developer using Java, C, C++, C$, etc.), and reduced maintenance costs from consistent naming conventions will FAR outweigh you miscellaneous debugging nightmares being experienced by a VERY small percentage of the programming population. It just HAS to be the case that case sensitivity has benefitted the many much more than it has hurt the few.

One of my naive 1997 rants is on this very topic. I’m a Linux/Unix/Win/OSX guy and still oppose almost all instances of case sensitivity. I’ve been bitten many times; but OTOH I’ve looked smart sometimes when I’ve been able to tell a colleague “oh, it should be ‘X11’, not ‘x11’; that’s the problem”.

For programming environments stuck with it, code databases, search and autocomplete can mitigate the problem.

Compound words in mashed-together variable names, er, “compound” the problem: ShutDownPort vs. ShutdownPort, that sort of thing. Mainly because of this I’ve gone to all lowercase identifiers with underscores (which ironically will not fix the example above). But a reasonably smart syntax checker could say “Hey, your case is borked here, and you actually declared it this way; fix?”

Case sensitivity tends to infect cross-platform stuff and force things its way. You can’t fix C++ without breaking linkers and code. You can’t even have a case-insensitive web server for a case-insensitive filesystem; multiple URLs pointing to the same file might still be considered distinct by search engines, analysis tools, proxies, firewalls, etc.

I still think case-insensitive-but-preserving local filesystems are the way to go.

Unicode introduces problems for case-insensitive collation and matching: a href="http://www.unicode.org/unicode/faq/casemap_charprop.html"http://www.unicode.org/unicode/faq/casemap_charprop.html/a
Some characters either don’t have matches in other cases, or many may map to one, or a case switch may even change string length (think of ‘fl’ ligature, German soft s, etc)

Interesting: ‘ls’ in Linux has been case insensitive for some time now.