Wanted: metadata server

The mime-type detection conversions on the GNOME lists is very tiring. This discussion has been focused on how to get the metadata for a file faster, but there is no issue with getting the metadata for a few dozen files. The real problems are:

  1. The directory is an inefficient mechanism for organizing a large number of files.
  2. We cannot query a simple file system to return a set.

I'm all for using EAs when they are available, but they are not available on all file systems that GNOME may run, hindering them as a solution. Moreover, there is a hidden cost for EAs. They take up more disk space (1-5% disk loss), apps must know to use them, and since most do not, the metadata is also stored in the file header or footer. EAs can only be accessed in the context of a single file, since the data is not organized in to anything like a database designed in the past 30 years, we cannot query them. Nor is the file system normalized to prevent duplication or orphaning of thumbnails and mime-types.

As a point of fact, Medusa does store a table of file metadata, and I can query my file system to get a set of data like file-name, mime-type. I can query by directory, or mime-type, or keyword (emblem/topic/category), and more. This mechanism returns several thousand file matches in less than a second.

The Storage project, by it's nature, addresses the metadata problem. Though it focuses on being a smart file/data system, it can be used to manage the metadata of files outside of it. Because it is a portable file system, there are no EA issues with the OS's file system It is designed to return an arbitrary set of data matching a query like a directory, or category. I proposed a formal means of handling metadata in storage that was discussed on the storage list.

This said, I don't think Medusa or Storage is appropriate. Medusa's focus is searching, and it's underly code isn't suited to managing metadata well. Storage is as it same suggests, and it doesn't help users or applications that must use the native file system. Both Medusa and Storage provide VFS access, but neither makes it easy to read ans write metadata.

We need a layer in-between search and storage to manage metadata. It must co-opt the existing VFS methods to read and write to the metadata system when doing IO to the underlying file system. An incremental indexer is needed to collect the metadata for files not written through VFS. FAM could be used, but it will not scale; a smart indexer is need that can watch the locations that will change most. Many applications, like Web browsers, music managers, and file managers need direct access to read and write metadata without writing to the file system.

One final thought. Metadata isn't a GNOME issue, all desktops have the same issues. Freedesktop might be to right place for it. If other desktop apps like KDE were writing to the metadata DB, there would be less need for indexers.