The hierarchical file system we have right now is great. The file/folder/directory metaphor works very well and makes it easy to organize your files on the disk in a way that makes them easy to find. The tree data structure inherent in this design makes implementation of software that interfaces with it easy and well understood. But it has some fundamental flaws. Let me give you an example.
Let’s say you have an extensive collection of movies and you want to organize the shit out of them so that they are easy to find. How would you go about it in a hierarchical file system? How do you split them up?
- By genre?
- By lead actor?
- By director?
This is not a trivial choice because it will affect how you will retrieve movies in the future. If you organize them by genre, you will have no easy way to find all the movies with your favorite actress. If you categorize them by actors then you lose the genre information. What we have here is a non-hierarchical data set that we are trying to shoe-horn into a hierarchical data structure.
Let me give you another example. Let’s say you are the Organized Natalie Portman guy and you see this picture. Where do you file it on your Natalie Portman folder or in the Scarlet Johanson folder? Or both? I’m not even going to mention pr0n collections which tend to be organized by participants and activities they perform (usually more than one per scene).
The point is that there is a lot of data out there that cannot be easily filed under a hierarchical system because they have more than one attribute according to which they could be categorized. The only way to fit them to that model is to give up on all but one of these attributes.
Of course you could try to be fancy and create multiple hierarchies and then use links and/or shortcuts to make it work. So for example the file is stored in the “Comedy” folder, but you can also get to it from the “Natalie Portman” folder and etc. Unfortunately this creates a good deal of overhead. When you delete file, the changes will usually not cascade and remove it’s links so you may need to do that manually. In other words, maintaining such a hierarchy is much more work than just maintaining a single file tree.
If you remember correctly, the fabled WinFS that was supposed to ship with Vista was being designed to resolve this exact problem. It was a database driven FS that was supposed to keep track of relationships between your files and allow for complex queries. Unfortunately it turned out to be vaporware. There are several open source projects out there that aim to accomplish something similar but they are all mostly proof of concept type things. None of them has any mainstream traction, and there is no linux distro that that ships with such an FS out of the box.
As cool as a relationship based FS sounds, the idea does not seem to have any traction in dev communities. For one, it is overly complex. Traditional hierarchical file systems have been around for years and they have been working just fine. Scrapping them now in order to embrace a new paradigm seems quite radical and risky. It is better to extend them to do what we want them to do rather than to reinvent the wheel. People just feel uneasy “querying” a database for their files rather than just traversing a tree… Despite the fact that everyone wants to have their files indexed and put in a database anyway – as evidenced by the popularity of Google Dekstop Search and it’s poor imitation – the Windows search available in Vista and Win7.
We already have half of the problem figured out. Can’t find a movie? Search for it. Why traverse a tree if you can just do a lookup?
Hell, if you use a clever file naming convention (eg: Ace Ventura: Pet Detective [genre: comedy][actor: jim carrey].avi) you can have your category problem all worked out for you. Double points if your OS supports attaching custom meta tags to files.
Windows 7 seems to have another piece of the puzzle in the form of libraries. Win7 libraries are basically virtual folders that store pointers to files that live somewhere on the file system. They allow you to group files into categories without actually moving them, or messing with links. I haven’t really used this feature myself because I do not currently own Win 7. But it does sound like it is a step in the right direction.
Linux ecosystem seems to be doing something similar with tagging and search. For example Gnome users now have access to the MetaTracker project which allows users to add meta-tags to their files, and has a built in search engine that indexes them implementing something very similar to the libraries. KDE has Strigi that is being combined with the NEPOMUK framework aiming to create similar feature set.
It will be interesting to see where will this trend lead us in the future. Of course we have to take into account that there is another trend converging with the search+tagging movement. Cloud based storage is gaining more and more acceptance every day. Few years ago it would have been inconceivable to think about storing your files on some remote server. What if there is a network outage? What if they delete it by accident? What if I need it quick and don’t have time to wait for it to download?
Today most people feel that storing files in the cloud is more reliable than storing them on their own hard drive. After all, they already got used to the idea that every few weeks some virus will ravage their OS and they will have to make nice with some geek to get it reinstalled. Having Google or some other service store their files remotely seems a great idea. Not to mention that these systems already have powerful search features enabled. So we might eventually get that relational FS thing – only it will be a remote file system living in the cloud.
Any Windows 7 users here? How do you like the library feature? Is it pretty useful or rather limited? Have you used it to organize your files, or are you one of these people who indiscriminately keep everything in the root of the Documents folder? How about the linux flavors. I use KDE mostly, but haven’t really messed around with Strigi.