Flaws of Hierarchical File System

Posted on May 13, 2010 by Luke Maciak tein.co/5279

The hierarchical file system we have right now is great. The file/folder/directory metaphor works very well and makes it easy to organize your files on the disk in a way that makes them easy to find. The tree data structure inherent in this design makes implementation of software that interfaces with it easy and well understood. But it has some fundamental flaws. Let me give you an example.

Let’s say you have an extensive collection of movies and you want to organize the shit out of them so that they are easy to find. How would you go about it in a hierarchical file system? How do you split them up?

By genre?
By lead actor?
By director?

This is not a trivial choice because it will affect how you will retrieve movies in the future. If you organize them by genre, you will have no easy way to find all the movies with your favorite actress. If you categorize them by actors then you lose the genre information. What we have here is a non-hierarchical data set that we are trying to shoe-horn into a hierarchical data structure.

Let me give you another example. Let’s say you are the Organized Natalie Portman guy and you see this picture. Where do you file it on your Natalie Portman folder or in the Scarlet Johanson folder? Or both? I’m not even going to mention pr0n collections which tend to be organized by participants and activities they perform (usually more than one per scene).

The point is that there is a lot of data out there that cannot be easily filed under a hierarchical system because they have more than one attribute according to which they could be categorized. The only way to fit them to that model is to give up on all but one of these attributes.

Of course you could try to be fancy and create multiple hierarchies and then use links and/or shortcuts to make it work. So for example the file is stored in the “Comedy” folder, but you can also get to it from the “Natalie Portman” folder and etc. Unfortunately this creates a good deal of overhead. When you delete file, the changes will usually not cascade and remove it’s links so you may need to do that manually. In other words, maintaining such a hierarchy is much more work than just maintaining a single file tree.

If you remember correctly, the fabled WinFS that was supposed to ship with Vista was being designed to resolve this exact problem. It was a database driven FS that was supposed to keep track of relationships between your files and allow for complex queries. Unfortunately it turned out to be vaporware. There are several open source projects out there that aim to accomplish something similar but they are all mostly proof of concept type things. None of them has any mainstream traction, and there is no linux distro that that ships with such an FS out of the box.

As cool as a relationship based FS sounds, the idea does not seem to have any traction in dev communities. For one, it is overly complex. Traditional hierarchical file systems have been around for years and they have been working just fine. Scrapping them now in order to embrace a new paradigm seems quite radical and risky. It is better to extend them to do what we want them to do rather than to reinvent the wheel. People just feel uneasy “querying” a database for their files rather than just traversing a tree… Despite the fact that everyone wants to have their files indexed and put in a database anyway – as evidenced by the popularity of Google Dekstop Search and it’s poor imitation – the Windows search available in Vista and Win7.

We already have half of the problem figured out. Can’t find a movie? Search for it. Why traverse a tree if you can just do a lookup?

Hell, if you use a clever file naming convention (eg: Ace Ventura: Pet Detective [genre: comedy][actor: jim carrey].avi) you can have your category problem all worked out for you. Double points if your OS supports attaching custom meta tags to files.

Windows 7 seems to have another piece of the puzzle in the form of libraries. Win7 libraries are basically virtual folders that store pointers to files that live somewhere on the file system. They allow you to group files into categories without actually moving them, or messing with links. I haven’t really used this feature myself because I do not currently own Win 7. But it does sound like it is a step in the right direction.

Linux ecosystem seems to be doing something similar with tagging and search. For example Gnome users now have access to the MetaTracker project which allows users to add meta-tags to their files, and has a built in search engine that indexes them implementing something very similar to the libraries. KDE has Strigi that is being combined with the NEPOMUK framework aiming to create similar feature set.

It will be interesting to see where will this trend lead us in the future. Of course we have to take into account that there is another trend converging with the search+tagging movement. Cloud based storage is gaining more and more acceptance every day. Few years ago it would have been inconceivable to think about storing your files on some remote server. What if there is a network outage? What if they delete it by accident? What if I need it quick and don’t have time to wait for it to download?

Today most people feel that storing files in the cloud is more reliable than storing them on their own hard drive. After all, they already got used to the idea that every few weeks some virus will ravage their OS and they will have to make nice with some geek to get it reinstalled. Having Google or some other service store their files remotely seems a great idea. Not to mention that these systems already have powerful search features enabled. So we might eventually get that relational FS thing – only it will be a remote file system living in the cloud.

Any Windows 7 users here? How do you like the library feature? Is it pretty useful or rather limited? Have you used it to organize your files, or are you one of these people who indiscriminately keep everything in the root of the Documents folder? How about the linux flavors. I use KDE mostly, but haven’t really messed around with Strigi.

This entry was posted in Uncategorized. Bookmark the permalink.

13 Responses to Flaws of Hierarchical File System

Jereme Kramer says:

May 13, 2010 at 11:10 am

I personally haven’t used Windows 7 much, but I have a friend, who is no luser, who does. Given the way I’ve seen him locate movies and such on his drives (his tree extends all the way up to drives organized into partitions on down) by searching through the different places they might be until he finds it, the libraries feature either isn’t useful, or it’s existence isn’t obvious.

I have few enough files in my directories that the only subdirectories needed are for music. Fortunately, Exaile and iTunes keep track of them for me.

As my video collection grows, I’d probably be happier with some frontend to mplayer that has a database the way music players do rather than a tagging system or some feature of the filesystem.

Reply | Quote
IceBrain says:

May 13, 2010 at 11:48 am

I’ve tried a couple of tagging FS (based on Fuse, for example), but the problem is that they force me to tag things manually, while there are normal apps that are specific for certain media, but do a much better job.

For example, when I have an MP3 file of Enter Sandman, I might tag it manually with “Metallica” and “Enter Sandman”.
But if I use Picard, it can recognize the song automatically and tag it appropriately with multiple tags (Album, Publisher, Year).

Likewise, for Movies I use GCStar. When I have a new movie, I just input some words of the title, and it’ll load from multiple sites (usually IMDB) the Director, Actors, Year, Genre, Critic review and even a thumbnail.

For pictures of pretty women (I’m a Japanese Idol photo collector, with 13GB :P ) I use Picasa (Non-free, noes!), which after a few manually tagged faces, can automatically recognize them in new photos, even with multiple faces per photo.

My point is, until tagging FS can provide some kind of plugin functionality to enable auto-tagging, I can’t be bothered to manually organize my stuff so neatly. I’m a busy* man!

* read: lazy

Reply | Quote
Nathan says:

May 13, 2010 at 11:55 am

NTFS supports hard links. You can download a utility for that. So long as the files are all on the same physical volume, it will appear as though your pic is in both the Natalie and Scarlett folders even though it’s only physically stored once on the disk.

Reply | Quote
IceBrain says:

May 13, 2010 at 12:02 pm

@ Nathan: That fails the “cascade delete” requirement, though.

Reply | Quote
Jason Scheirer says:

May 13, 2010 at 12:34 pm

This is funny, because the BeOS file system had tags (and really a very nice file metadata scheme, period) and super fast search/index almost 15 years ago.

Reply | Quote
Kim Johnsson says:

May 13, 2010 at 2:22 pm

@Luke: I use Win7 as my primary OS and I’ve played around with Libraries a bit. They’re not particularly awesome. In short, it’s just like hardlinking a bunch of folders into a “library”. You can’t link files, you can’t link folders based on criteria, and when you’re done you’re still bound by the hierarchy of whatever folders you include. Oh, and if you include a folder, all of its files and subfolders will be included. No way of limiting it to certain filetypes.
So basically it’s just a way to aggregate a bunch of folders at different places in the filesystem. Which, sure, could be useful, for example if you have music on several partitions and want it all in the same place. But it doesn’t really solve the problems with an HFS, unless you wanna put every file that needs a different tag in a folder of its own -_-

@IceBrain: I agree, some sort of auto-tagging would be vital for the success of a tag-based FS. There’s lots of ways it could be implemented though, some of which should be rather trivial. As long as the actual tagging is exposed (maybe some FS utility to tag/untag files), writing a script to for example read ID3 tags and tag the files appropriately would be fairly easy.

@Nathan: It’s also not very convenient. I should know, I tried organizing my porn with it XD (I’ve mostly given up on locally stored porn now though in favour of *tube sites)

Reply | Quote
Kim Johnsson says:

May 13, 2010 at 2:23 pm

Also, dunno why I’m suddenly posting under my real name XD

/freelancer

Reply | Quote
Owen says:

May 13, 2010 at 7:12 pm

Given my propensity towards (anally-retentively) tagging everything correctly in my Music collection, I’d like to be able to apply this to other things such as images; let’s say for example I have some holiday snaps from when I went to, as an example, the UK. I might have a folder called Holiday-UK-2010 or something similar, but what I’d love to be able to do is tag each image individually with metadata, so that I could do something as simple as opening either Spotlight (on my MBP) or something similar for my PC running XP, and then type a few things in and it would search through the tags, and done.

Let’s just say a photo of myself standing in front of the pillars in Cardiff where Torchwood is based (huge Dr Who and Torchwood nerd here) and it’s a very blue sky, then I could meta-data it to have say, “me, cardiff, uk, torchwood, blue, docks” etc. Tag with a bunch of things in the image itself so that while my hierarchical directory listing stays intact, I’m able to grab something immediately by typing in a few tags.

This could also work well in those amusing pictures you download from the internet; the image might be called “3452524.jpg” but is a picture of a guy wearing a hollowed out watermelon for a hat (do not ask why). That way instead of having to scroll through your folders for a while, simply mashing in “watermelon” would give you the result very quickly.

This could be applied to the movies you might get too, so that you might have The Matrix.avi, but you can meta-tag it with “keanu reeves, carrie anne moss, laurence fishburne, action, ” etc etc. Simply open whatever meta-searching tool you use, and BAM!

I know this would take a LONG time to tag everything perfectly how you want, but at least until you get around to doing it all you can still use the current method. I know I’d take the time to do it. Just someone has to come up with something :)

Reply | Quote
Luke Maciak says:

May 14, 2010 at 12:14 am

@ Jereme Kramer:

I actually don’t do folder hierarchies for music. I have a single directory called “mus” that acts as a dumping ground for all the mp3 files I obtain from various places. Then I use foobar to find things for me when I want to listen to them.

@ IceBrain:

Dude, these are some very useful links. I am totally checking out Picard and GCStar.

@ Nathan:

Yep, IceBrain is right. Hard links can make the collection harder to maintain or clean up.

@ Jason Scheirer:

I never used BeOS. That’s the funny thing about proprietary technology though – sometimes innovative and clever products are totally overshadowed by crap that is aggressively marketed to the masses.

Back in the day, Amiga OS was in every way superior to the early releases of Windows (which was just a half-assed graphical shell for MS -DOS. And yet, look at what happened.

@ Kim Johnsson:

Bummer. I thought that the libraries were something to look forward in Win7. Oh well.

Also, good job posting under your real name and mentioning your pr0n collection at the same time. ;)

Btw, locally stored pr0n is superior because:

– the resolution is usually better
– you can skip to the “good parts” without waiting for it to buffer
– *tube sites regularly shuffle their content and take scenes down
– pr0n torrent sites offer better range of choice – for example if you want to find more scenes with a particular girl and etc..

@ Owen:

Wait… You are on a Mac. Doesn’t OSX have built in file tagging? I forgot to mention it in my post but their meta-data system is possibly the best one on the market right now.

Check this lifehacker article that makes suggestions on how to use it.

Reply | Quote
Owen says:

May 14, 2010 at 12:19 am

@ Luke Maciak:
Ah, fantastic news! Admittedly 90% of my data is on my PC at home, but at least that will give me a step in the right direction for if I end up using my MBP primarily and then just doing a dump across to a NAS or similar when I’m at home.

Thanks for the link to the article!

Reply | Quote
Kim Johnsson says:

May 14, 2010 at 9:27 am

@ Luke Maciak:
Haha, everyone who knows me by name on the internet knows I have porn anyway. Well, except my parents, but they can’t really use the internet so it’s ok. My mum still insists on calling Google “goggle” -_-

Oh, and to counter your points:

– *tube resolution is more than enough for what the content is intended for. HD porn just seems like a waste of space to me, to be honest.
– Most if not all *tube players I’ve used allow you to skip to anywhere without buffering the entire clip. Sure, you have to wait for it to buffer that particular part, but that only takes a few seconds.
– True, content does get taken down sometimes. I don’t have a lot of favorites per se though, and generally they’re not worth the extra effort. There’s always something else anyway.
– I don’t have favorite girls either, so I can’t really argue that point. Sure, torrent sites do often have a better range, but seriously… how much different stuff do you really need? :P If I really care about what girl it is I’ll stick to my own collection, and that doesn’t exist on torrent sites either ;)

Reply | Quote
Luke Maciak says:

May 14, 2010 at 11:12 am

@ Owen:

Nice! Glad I could help.

I also noticed that Vista has a halfway implemented tagging support as well. Sadly, I think you can tag Office documents and JPG’s which makes it fairly useless.

@ Kim Johnsson:

Well, maybe I’m just a hoarder. I prefer to download stuff on the off-chance that I will want to revisit it again later. This includes movies, TV-shows and etc. I usually prefer to grab a torrent even if there is a streaming version available out there on Hulu or whatever else.

Local copy always seems more substantial to me – even if I delete it right after watching.

Reply | Quote
Adrian says:

May 17, 2010 at 1:27 am

I do all that you mentioned with my movies. But just in a cleartext file.

First, I type the movie name and then all it’s properties (actors, producer, country, year, genre, ..). When I’m looking for something particular, I just do a ctrl+F for it. Every movie is located in it’s own folder, so everything will always be ordered alphabetically.

Reply | Quote