Lunch time thoughts about search and metadata

I've been too busy to make any metadata/search contributions this year, but I'm trying to make time for what is import--a desktop that just works.

As much of the Evolution-Data-Server deals with metadata, would it be wiser to store all metadata in e-d-s? I've always planed to put metadata below gnome-vfs because file and user metadata predates gnome-vfs operations. But my extending e-d-s to contain file file data like Posix, EXIF, ID3, etc., we have one, albeit fractured, source for metadata.

I've long advocated separating search from metadata because there will never be one source. I'm pulling the metadata database out of Medusa because users will want to search Google and p2p shares like iFolder and Gnutella. Moreover, By separating the repository from the querier, I can address some security issues by having public and private native GNOME repositories. The querier translates and dispatches queries to the repositories, then merges the results.

A smart GNOME-metadata-daemon will provide direct metadata access to manager apps like Nautilus and Rhythmbox, and associative tools like bookmarks for Epiphany. Querying and reconciling the contents a folder with 1000s of files will be faster than scanning the disk for each file.

Several strategies are needed to keep the metadata repository current. The g-m-d may use FAM to watch a few folders where change frequently happens, and calls an indexer to extract the metadata when after each file update. the g-m-d will launch an incremental indexer to crawl personal folders from time to time to update the repository. the g-m-d will examine the changes to update the set of watched directories. GNOME-VFS may be aware of the g-m-d and will use an introspection library to extract metadata during writes.

The GNOME-metadata-daemon is really a mess on my harddrive. Adding this kind of code to the dependency list is a bit tricky.

                         Indexer                             |       libintrospection (metadata extraction + creation)                             |                GNOME-VFS (virtual filesystem)                             |    GNOME-metadata-daemon (controls access to the repository)                             |       libmetadata (RDF obfuscation + schema management)                             |                librdf (metadata repository)

libintrospection would normally exist on top of GNOME-VFS. To catch the data being written by a GNOME app, libintrospection would either be below GNOME-VFS, or is a part of GNOME-VFS. libintrospection would be similar to GStreamer pipelines. Pipelines of introspectors would be called to extract and create metadata. As mime-type is an aspect of metadata, the pipeline manager must construction the pipeline as data is extracted. After each step, the pipeline manager must determine which introspector to call next to complete the gathering of metadata. Introspectors would be registered in GConf like thumbnailers, so new introspectors can be added by applications.