Filesystem Paths: How Long is Too Long?

I recently imported some source code for a customer that exceeded the maximum path limit of 256 characters. The paths in question weren't particularly meaningful, just pathologically* long, with redundant subfolders. To complete the migration, I renamed some of the parent folders to single character values.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/11/filesystem-paths-how-long-is-too-long.html

What would be interesting is to actually hear someone from Microsoft comment as to why this limit still exists in their brand new Vista system even though the NTFS filesystem supports paths with 32000 characters.

They also said they completely reworked the explorer shell so I don’t understand why they kept this limit.

Of course the obvious answer would be backwards compatibility but couldnt that have been solved by implementing this limit for programs that use the compatibility modes ?

This is actually starting to annoy me the more I think about it, Vista was supposed to be reworked almost ground up.

C#? WHO IS THIS? WHAT HAVE YOU DONE WITH JEFF?!?!?

:stuck_out_tongue:

I’ve come across this on a previous job. The project was a knowledge management system that kept track of previous consulting jobs they worked on (think of it as the IT dept of “bob bob” from the movie Office Space (and watching that movie was actually a job requirement of that department)). It ran on asp classic (this was in 2001).

Upon migrating to a new server, we discovered that accessing some of the 70k documents popped up a dialog box with 3 boxes: domain, user, pwd. My first guess, because it was based on the first problem found was that some of the files were owned by the LOCAL admin account on the now turned off server. Oops, the ACL contained SIDs that the new server didn’t know. Ok. Write a script to run through the file system, look for files with either an unknown owner or a local account and take ownership with a domain account.

Problem solved? Hah. Next it turned out that since many of the offices of this multinational insulting company used pesky ferrner words, those silly accents and slashes through letters caused good old yankee asp classic to have a barf. Next, run through and identify those pesky european words. That took care of a thousand or so files that were popping up the same dialog box with 3 boxes: domain, user, pwd.

Problem solved? Not a chance. By now the screaming idiots were screaming to fire my fanny perpendicular, until I found that the remaining files had paths well in excess of 250 characters. I think the record was near 600. Darn those Germans for having long long long words. And darn that company for having a multilingual standard for keeping them organized. And darn that loudest crazy screaming idiot for using the filesystem to rearrange the files into places that the web front end could not manage.

The explorer shell back in Win98 days could handle 32k long paths, but the standard API calls all used ANSI/ASCII calls instead of the “wide” ones that could use that newfangled unicode stuff.

My point, and I’m pretty sure I had one when I started, is that this problem isn’t new, and it isn’t limited to the .NET framework. It is one of those pesky limitations you end up with when the creators of your filesystem and OS make assumptions about everyone only using ASCII and that nothing outside of America exists at all.

In any case it seems like a potential security hole to have variations in the API, would for example an antivirus program pickup a virus that hide in paths greater than 259+1 characters ?

I guess that would depend on which specific api calls the anitvirus scanner uses.

Moon/DL: One advantage to using a special C# program is that both of your methods builds a list in memory and then performs a sort and spits out the result. Problem with that is, when you run it against a ‘file share’ it will eventually kill your machine. Jeff’s routine does 1 thing, holds the largest path it comes across in a variable and then displays it at the end and therefore has a consistent memory requirement of ~15MB. I would guess that the NIX command would require just as much memory as the Powershell command. I have been running the Powershell command now for a few minutes and it’s using ~150MB. That’s 10x the memory usage of the C# program.

Oh well even though the PS command is going to cripple my machine when I run out of memory at least it runs on 1 line. It just goes to show that the most efficient things are ones that are written for a single purpose. :smiley:

I guess you could argue for “refactoring” if you reach the MAX_PATH much along the same lines as is good-practice when we write source code.

Last time I ran into this issue, was with some home brewed backup util which basically copies (selections) of my C, D partitions to a NAS unit (X). If X is set up to mirror these partition hierachies under, ie. X:\backup\week43\C-drive\ I run into trouble with certain .NET and various other files.

So I guess a 512 char path would not hurt, much like most have also violated the de-facto 80 char per line in source code files.

Here’s how my old abused iBook clocks in:
346 /Users/ejt/Library/Application Support/SyncServices/Local/clientdata/0063006f006d002e006100700070006c0065002e006900530079006e006300440055002e00450053004e003500310038003200310034003300370031/0063006f006d002e006100700070006c0065002e006900730079006e0063002e00760061006c00750065004d006900670072006100740069006f006e0054006f006900530079006e006300320032
I got a few more hits from ‘SyncServices’, and the next runners up are from system header files, e.g.:
231 /System/Library/Frameworks/IOBluetoothUI.framework/Versions/A/Resources/Japanese.lproj/Documentation/Reference/IOBluetoothServiceBrowserController/Classes/IOBluetoothServiceBrowserController/IOBluetoothServiceBrowserController.html

Terrier: I don’t know what you’re smoking, but have you noticed flash (aka non-moving-parts) drives still use hierarchical file systems? The file system is an abstraction – it has nothing to do with the underlying hardware. If you think you have something better (like some vague concept where you “search” for everything, a la longhorn – ewww) then we’d love to hear your bright idea.

And I love that the Windows crowd jumps on their little PowerShell thing, and then it runs into the API limitation and can’t actually report the path if there’s one longer than MAX_PATH. Hilarious. I wouldn’t be surprised if the cygwin one did too. Don’t even get me started on cygwin – doesn’t properly support gdb, can’t do development on it, and the console UI sucks… a shadow of a real system.

Finally, I think it makes a lot of sense for applications to use the filesystem as a database… generally users don’t need to go poking around in the internals of data directories, but if they do need to, it’s a lot easier to navigate the file hierarchy than to try to find/write code to parse some custom file format that has all the various resources jumbled together in an unknown way.

Take Keynote bundles for instance – looks and acts like a single file, but if you want, you can “View Contents” and swap an updated figure or play videos directly without ever having to launch the application. Very handy.

A lot of work has gone into allowing the filesystem to handle a variety of operations in an efficient way, and to then throw that away and try to reimplement your data storage as some kind of flat file just doesn’t make sense. But I guess maybe it does if you live in a world of MAX_PATH idiosyncrasies.

Aaron G: your choice of trying to argue backward compatibility of *nix/OS X as a shortcoming is not the best point to pick.

Under *nix, since all the system sources are available, if you need compatibility you can compile the older libraries and have your binary blob link run against those. Voil.

Under OS X, Apple has bent over backwards to support older code – most applications from OS 7 still ran in OS 9, and most OS 9 code will still run within “classic” in OS X. Not everything, but I’d say it’s comparable to the odds that some Windows 3.1 app is going to run in Vista.

And looking ahead, if you notice my system headers are versioned – OS X can retain the old libraries each time it is updated, so future upgrades won’t be an issue with breaking older apps.

And perhaps you haven’t used *nix in a while, but most software can be updated by a package manager, so people don’t have to deal with Makefiles.

A final key is that both *nix and OS X have found ways to unshackle future development without utterly breaking backward compatibility. On its current path, Windows will just keep getting bigger and bloated as it tries to retain and work around every idiosyncrasy.

Once more, Jeff Atwood proves that UNIX Windows.

I can find the longest path on ~my~ filesystem by typing

sudo find / | awk ‘{print length($0) " " $0}’ | sort -n | tail -n 1

I don’t need to fire up a text editor and compiler to do that; in fact, I don’t need help from Visual Anything. And I don’t need to tell people to keep their paths short because I run a two bit operating system.

Oh yeah… That shell command finished running on my Mac Mini – it took just two minutes, it probably helped that it’s got a 7200 rpm drive. Here’s my longest filename:

298 /Users/luna/restore/usagi/Library/Application Support/SyncServices/Local/clientdata/0063006f006d002e006100700070006c0065002e005300610066006100720069/005400720061006e0073006c00610074006f0072002d004c006500670061006300790043006f006e006400750069007400420075006e0064006c00650050006100740068004b00650079

It’s a crazy binary blob, but so what?

Maybe it’s just me, but on Linux I like filling up directories with filenames like:

windowssucks
windowssuckS
windowssucKs
windowssucKS
windowssuCks
windowsssCkS
…

WTF? Why are you coding in C# now? :frowning:

Ethan: Under *nix, since all the system sources are available, if you need compatibility you can compile the older libraries and have your binary blob link run against those. Voil.

Try telling that to my dad. You will get an even more glazed look staring back at you as the one that I had when I read that statement. USERS SHOULD NOT HAVE TO COMPILE SOURCE CODE!

Tim – I agree that Jeff’s program uses less computer resources than mine. On the other hand, mine took less human time to write: It took me 30 seconds to write a first draft; I discovered that it didn’t handle filenames with spaces correctly, which took another 30 seconds to fix.

My program is fast enough, at least on my system: it scans about a half a million files on my Mac in two minutes.

Yes, there are more efficient algorithms than sorting for finding the largest member of a set. In the Unix style, I could write a small filter that implements such an algorithm in awk, perl or C; then I’ve added a function to my library that I can compose to make efficient pipelines to do other jobs.

My point is that any good Unix admin can knock off scripts like this in a few minutes. It makes our jobs a lot easier. Powershell will bring this kind of capability to Windows admins, and that’s a good thing.

Sailor Moon, since you are a troll I guess I shouldn’t answer but once again nix nerds seem to fail to understand that there are several ways of doing things.

You could just as well install something like Windows Powershell and get the same thing without creating a program for it. Or you could create a simple script.

Did you really think you need visual studio to do anything in windows, surely you are not that ignorant ?

In this case he chose to do it this way, what is the problem ? The problem is only that you need to reiterate to yourself that your poor choice of system is somehow superior.

Sailor Moon, you are indeed very clever. In spite of the fact that the code is clearly a console app, you still made a point of the word visual in the title. Clearly you are ignorant of the facts and have taken the “Unix systems are better by default” approach that so many *nix fanboys seem to be constantly bugging me with. I find it interesting that you find the idea of the visual aspect so appauling, I’d love to know what your actual objection to the process is. I also wonder if you’ve ever written anything in C# or .NET to actually provide a basis for your objections.

One day we might live in a world where people can just use the right technology for the right job without this rabid brand loyalty that everyone seems to have. Why not just accept that its possible to live in a world where you use Windows for some things and *nix or anything else you desire for other things. To say anything in relation to Windows is inferior (or conversely that anything in another system is superior) simply because you do or don’t like the system is just ignorant and a somewhat counterproductive attitude.

Sucks to be a windows user. The longest path on my system here at work (Linux) currently is 338 characters. Maybe you should go get yourself a real OS.

I have a couple of questions regarding the Unicode vs local drive limits. At work we mimic the region/territory structure of our real estate portfolio on a shared drive for document storage purposes (document management is a battle being fought). Some of the folder structures can be quite lengthy due to the fact that it mimics our CRM system so the application can go find folders for leases, etc. Our help desk has told them that their path had to be less then 205 characters long, and certain files wouldn’t open for them until they shortened the path. Now for me, I never “mapped” a drive and connect via UNC path but didn’t seem to have the same problem. Is UNC the same as the Unicode path, since UNC originally came from UNIX?

Lots of people that can’t read properly apparently, like pointed out in the article the NTFS system can support much longer paths so this limit is imposed by the api, probably because of backwards compatibility.

I am suprised though that they let this creep into Vista as well, this decision must have been because of backwards compatibility. It also raises questions about how much they actually changed during the past five years of development of Vista.

Tim: Your dad doesn’t care about backward compatibility with some ancient custom “business critical” cruft. It’s a moot point.

It’s not like Average Joe is going to want Netscape 4.07 or some other random outdated program. They want the most up to date stuff at the time of installation so they can remain compatible with current resources for as long as possible.

What is an issue for Joe Average is then having to buy faster hardware to upgrade their software, which is what Vista will do. New versions of Linux and OS X don’t bloated hardware requirements each time a release comes out, and to the contrary, OS X has run better on my same hardware with each revision.

The people who care about running some specific old binary POS, but moving to newer/faster hardware which isn’t supported by the old system the POS was written for, is for stuff like accessing some internal corporate resource that only works with a particular program that no one can update. And in this situation, you’ll have a big company with people who do understand how to compile, and they’ll set up the libraries once and then remotely distribute a package. So in this case, Average Joe doesn’t even see an installer, much less have to compile anything.