Better indexers and metadata storage lead to insomnia
I tried to take a break yesterday, but I was too distracted by Medusa developments. I didn't get to sleep until 3:00 AM, and I got up late to work, and I spent the day reading Medusa code instead of working on my company's problems. I blame Marco. I've got to throttle back or I'll burn out.
I played with OTS (open text summarizer) that Marco and Dom recommended. It is a very clever tool. I plan to use it inside the indexer so that only the most important words are indexed. This will significantly reduce the size of the content DB. The ratio feature will let me assign different ratios to mime-types, because source code should be more verbose to be accurate. This beats the snot out the of stop-list I was contemplating. All I need to do is guarantee that the mime-type parser always delivers UTF-8, and that was the plan.
Coupling OTS in the back and an enhanced GSF in the front, will make mime-type parsers easy to create. I'll extend GSF so I can get metadata, summary, and attributes from office, documents, PDFs, images, and music files for special handling. I'll then get the file data from GSF. I'll need to make some mime-type handlers, but that isn't hard since I'm just extracting text at for this phase, and I hope to borrow the importers from Gnumeric and Abiword.
But Marco put two things in my head. Most of last night was focused on his need for a metadata DB. Medusa is focused on filesystem metadata, not personal data. Epiphany needs a place store bookmarks and history and it must be accessible to all apps. I was planning to make a smart indexer to handle bookmarks and history like a filesystem, but his approach is better. Since I wanted to get all the personal information into Medusa's DB in a future phase, I was going to toss the DB. Well his summary of the problem really helps me focus on where I want to go. In the end, Medusa will be a much smaller tool, and the Metadata DB will become the primary repository and service. I can't stop thinking about it, but I really cannot think about it until I've got Medusa fully working in Nautilus ,and engineered so I so can re-architect Medusa without screwing with Nautilus. I doubt I can start rebuilding the DB, split Medusa into an indexer and storage project, until next year, but Marco wants this started sooner.
I looked at DrWright's activity monitor code today at work. I think GNOME needs an idle monitor. Apps like Medusa and DrWright can connect to it. GNOME should split the screensaver, power management, and system lock and make them clients of the idle monitor.