I use – all the time too. Sometimes it gets converted into a dash by whatever software I’m using, sometimes not. Either way, the message is clear. The double-hyphen is a pretty well-known internet convention for expressing a “dash”, similar to asterisks for bold or underscores for underlining.
Unicode is da BOM
It’s a follow up to a previous post on Internationalisation - Does your software pass the Turkey Test? Or at least an addition. Handy things to know for software developers. That’s why it was blogged about.
I just wish people knew UTF-16 was a variable-width encoding, like UTF-8. It is not a fixed-width encoding like UTF-32! Why do so many people not know that?
“Mainly because they have to be escaped in XML …”
WTF? Does some tired old blogger need to write an article titled, “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About XML (No Excuses!)” for you?
Single-alphabet minded bastards…
Why do you think anyone finds it funny?
For extra credit: what is the BOM?
Really, that question is not a rhetorical question, you expect me to answer, “This is funny because the box is representative of displaying a unicode value with an incorrect encoding scheme”.
Maybe the rhetorical question is what kind of person am I that finds this funny.
rhetorical question n. A question to which no answer is expected, often used for rhetorical effect.
No answer is expected because every developer worth his or her salt should already know why it’s funny-- no explanation required.
It’s funny because, fingers crossed, someday UTF-8 will take over the world and we’ll never have to read another article like “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” again. Text will flow seamlessly anywhere and in any language. And that will be a beautiful day. And these slogans will be a badge of honor of to those of us who had to deal with crap like installing an auxiliary library and learning a whole new API just to be able to output a different language.
I agree Unicode should be implemented on all software No Excuses! It’s not difficult…
Windows is a problem in that cutting and pasting between standard Windows apps fails… Why does copying a Unicode string from and HTML editor to Word to Outlook not work when they are all written by Microsoft!
The one I love is your comment “As it turns out, Windows-1252 can be a better default for web strings than UTF-8”
This is false except on Windows because Windows seems to make a distinction between Ascii text and Unicode text because it uses UTF-16
If it used UTF-8 then (at least in America and the UK) Ascii 0-128 and Unicode are the same …
On one hand, I understand the need to support characters outside the standard American alphabet. On the other hand, I don’t want to do a lot of extra work for the 99% of the apps that I write that have a 99% North American audience. On the gripping hand, perhaps knowing the ASCII charset will come in as handy in the next 20 years as knowing EBCDIC has in the last 20 years.
it’s funny because people are answering the rhetorical question (including me now!).
I remember the day I spent 10 hours fixing the question mark bug that a user reported. argh
How about this one:
I#65533;Unicode
Ole
But Jeff, why do you still avoid using goodness of Unicode?
For instance, why use ‘–’ when there is ‘—’ available?
So, do I have to not read the link that you send us to before answering about the BOM?
But, we use UTF-8. In UTF-8, you don’t need the BOM.
I guess his — key is broken.
Great article.
Even greater use of “on the gripping hand” in Chris Chubb’s comment!