# Sorting for Humans : Natural Sort Order

The default sort functions in almost every programming language are poorly suited for human consumption. What do I mean by that? Well, consider the difference between sorting filenames in Windows explorer, and sorting those very same filenames via `Array.Sort()` code:

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html

This discussion is funny.

The problem isnât with sorting. The problem is with the data structure. If you want a hierarchy use one. Make a tree and sort the levels of the tree.

Take this shopping list for an example:

White Milk
Chocolate Milk

Itâs a list. If i sort the list iâll get:
Chocolate milk
White Milk

Whaaah /cry the user says 'cause milk is seperated and not sorted propery!
Whaaah /cry the developer says âget used to itâ this is the way computers have worked since the iron age.

the problem isnt the sort, the problem is the list. What we want is a Tree:

Wheat
White
Milk:
Chocolate
White

Then when we sort we sort each level of the tree.

When you sort the strings:

dumbuser10
dumbuser2
dumbdeveloper10
dumbdeveloper2

If you expect the result:

dumbdeveloper2
dumbdeveloper10
dumbuser2
dumbuser10

Then you are expecting the tree:

levels of the tree.

Take this shopping list for an example:

White Milk
Chocolate Milk

Itâs a list. If i sort the list iâll get:
Chocolate milk
White Milk

Whaaah /cry the user says 'cause milk is seperated and not sorted propery!
Whaaah /cry the developer says âget used to itâ this is the way computers have worked since the iron age.

the problem isnt the sort, the problem is the list. What we want is a Tree:

Wheat
White
Milk:
Chocolate
White

Then when we sort we sort each level of the tree.

When you sort the strings:

dumbuser10
dumbuser2
dumbdeveloper15
dumbdeveloper3
smartdeveloper1

If you expect the result:

dumbdeveloper3
dumbdeveloper15
dumbuser2
dumbuser10
smartdeveloper1

You are expecting a tree:
dumb:
developer:
3
15
user:
2
10

smart:
developer:
1

So fricken use a tree user and developer alike.

omg sorry about that post, i guess iâm dumb user 2 'cause i cut and pasted all fooked

ânatural sort groups numerical substringsâ

Yes, but if ânaturalâ means âthe way humans do itâ, it has to do much more and handle other popular ordered substringsâŚ

Days of week:
â˘tuesday
â˘wednesday
â˘thursday

Date/time:
â˘15-8-2004
â˘1-10-2004
â˘1-1-05

â˘5:00 AM
â˘14:00
â˘4:00 PM
(special tricky for 12:00-1:00 AM/PM)

â˘9, James street
â˘10, James street
â˘10 bis, James street
â˘10 ter, James street
â˘5, Rudolph street

Product reference, or anywhere digits are not numerical, like phone ânumberâ:
-01K305
-1K205

0001
F01A
f01b
(0-9 and A-Z/a-z in straight order)

Version:
â˘v1.9
â˘v1.10
â˘v1.10.1
(a period is not a decimal separator)

Sorting with or without leading article (Mr, Miss, St, âTheâ etc).

etc etc

And then there is the issue that some sorts depend on locale.
And then there is the issue that some strings were built with other, untold locale (like filenames).
And then there is the issue that some strings are ambiguous and require looking at the entire set.
And then there is the issue that some sets are fundamentally ambiguous.
And then there is the issue that since the sort depends on the set, it is not stable nor predictable.

If you want to make good human-like sortâŚ good luck.

And in the case of the version numberingâŚ thats a tree:

v:
â1:
----9:

Windows Explorer in your example sorts by file name, not by file name with extension. Hence the difference.

If you apply Array.Sort() to file names only â you would get result that is pleasant for users.

I have both learned to use padding zeroes in the filenames, as well as written thesorting a couple of times in my profession.

HmmâŚ I bet thereâs an ISO standard for this somewhere. If people liked communicating across borders more, theyâd be less exited about localization and more into internationalization. There is, seriously, no need to have a local way or writing dates. But then again, what do you expect from people who donât use the metric systemâŚ

Musaran, I agree with you totally - natural sorting is not well-defined. I was simply describing how most natural sort algorithms are implemented.

Aaron G, funny you should say nobody needs to sort decimal numbers as a sub-string. This natural sort algorithm actually supports sorting decimal sub-strings by NOT ignoring leading zeroes (although that behaviour can be disabled):
http://sourcefrog.net/projects/natsort/
Before you say âwell, thatâs just some random guyâs sort routineâ, he claims that his code has been incorporated into PHP:
http://us3.php.net/manual/en/function.strnatcmp.php

Actually, this whole discussion boils down to a central issue in human factors: How Software Works (well-defined algorithms) versus What Users Want (fuzzy, ill-defined specifications).

Once AI is smart enough to figure what users really want, weâll all be out of jobs. (The code will write itself).

Daniel âDangâ Griffiths:

Didnât your teacher tell you youâd go blind if you continued to âEmbrace the Pythonâ.

Maybe you could try âpolishing the Rubyâ?

On the topic of this post:

One sort approach is never going to work in all situations. Let your users try out the sort with various filenames etc, get feedback from them, change the sort to behave as close to their expectations as possible. Easy really.

Ash wrote:
âOne sort approach is never going to work in all situations. Let your users try out the sort with various filenames etc, get feedback from them, change the sort to behave as close to their expectations as possible. Easy really.â

Aaron G wrote:

âAs if youâd ever need to sort hexadecimal in a system with actual users. If you find yourself encountering this issue frequently, itâs a strong indication of awful UI design and potentially terrible back-end design.â

Peterh wrote: âIf people liked communicating across borders more, theyâd be less exited about localization and more into internationalization.â

Hey Jeff, first time commenting.

Great blog, your recent blog about the 80/20 actually inspired me to start my own blog.

Saw this one and have to give it my own try:
Quick Java Comparator:
http://simplesql.tigris.org/servlets/ProjectDocumentList?folderID=0

Cheers!

@Tom:
âWhat? Really? Hex?
Not only do only only you and DEAD people know Hex, but most of them donât use it that regularly.â

Thatâs not the point. The point is what is the natural order for hex strings? The ânatural orderâ is under defined, you canât have it in the language (or in its standard library).
Only the programmer can decide what is the natural order. No, the user canât, the user sees only what sheâd like you have to take into account all your users.

@ Aaron G:

"If some of you are really so concerned about this obscure edge case, put in an heuristic that checks for a string composed of only decimal digits and same-case letters A-F, with at least two characters from each category. "

Oh yesâŚ rightâŚ lets put there an heuristcs so that we use two different ordering depending on the elements of the set. WaitâŚ what appens if only some strings are hexadecimal?

âJesus guys, it IS important to consider edge cases, but that doesnât mean you should scrap the whole idea if a few of those edge cases will produce incorrect results. You just make sure that youâve defined this behaviour, and set expectations accordinglyâ

No, this is the problem. A lexicographical ordering is the same everywhere. The ânatural orderâ is not, you canât have it as a function in the programming language, it depends on the domain.

If you want to show a sorted list of strings to the user you have to choose one ordering. If, depending on the domain, there is an ordering that is alwayscorrect* you should implement that. If the list of strings is general enough you have to choose an ordering and screw some of your users. The lexicographical ordering has the undeniable advantage of screwing the users in a predictable fashion with a solid mathematical background.
The natural order is just something you pulled off your @ss, subject to change when you find a âbugâ.

E[X], well whether you like it or not, strnatcmp seems to be standard in PHP:
http://ca.php.net/strnatcmp

Just because ânatural sortingâ is ambiguous, doesnât mean someone canât make up a âstandardâ for it.

@Will
well PHP is also the language where âno one needs namespacesâ, calling that a standard is quite a bit far fetched. Is just the way they think it should be.

E[X] wrote: âwell PHP is also the language where âno one needs namespacesâ, calling that a standard is quite a bit far fetched. Is just the way they think it should be.â