As my long weekend plans were dashed by Time Life, I did not get Medusa's GUI fully working. I did refactor the UI to behave better, and added the missing menu of available emblems. I surprised that the emblem's image does not display in the GtkOoptionMenu, but does show in the GtkMenu when I select it. I'm not sure if this is a GTK2 behavior. Images do show in the GTK GtkOptionMenu hidden in Nautilus.

I'll add the query this week. I think I'll have a working app next week after I have added created the results list.

I decided to postpone using GTKExpander until it was available in Glade, but I've made some refinements to msearch-gui nonetheless. I'm using the checkbox to hide and show the clauses. Only the visible clauses will form the query. I spent more time then necessary making gettext friendly label changes for the checkboxes; I've got to stop moving between languages and frameworks. I've got to hook the new editor behavior up the to query generator next.

I'll have a workable Medusa interface soon. Soon I'll start putting this work into Nautilus.

I've been working intermitantly on a gui search tool that I can reuse to make widgets and dialogs that can be reused by other applications like Nautilus. I've spent more time than necessary exploring ideas. I looked into making GTKExpander work from glade-2 and libglade, but it is too much of a distraction to pursue. I'll use checkboxes for the time being and will update when glade-3 is available.

I lost my private Nautilus-with-Medusa build in my exploration of jhbuild/Fedora/Ximian unstable. I've settled on Ximian unstable for the time being (I love the rug tool). I've mixed in a few Fedora packages to get glade-2 and a few other tools working. I just built dia 9.2 so I can see and edit my diagrams again. I really do a lot of drawing before I ever start putting code to compiler.

I made some small, but import changes to the Medusa GUI app I'm making. The sidebar is near complete. I'll swipe the file list from gnome-search-tool in a few days. After that, I'll create a rules editor like g-s-t and Evolution has to explore another UI option.

I finally closed some Medusa bugs now that the fixes are in HEAD. Hurricane Isabel's distractions delayed me for a few days. I got a reply from Calvin Smith who seemed pleased that his patch was finally applied–two years after he submitted it. I have another set defects almost fixed. I'll close them with the 6.1 release which targets Nautilus. I finally located my sophomoric error in the signal handling of the GUI Medusa search tool I'm making. I think the rest of the changes will go quickly and I hope to have Medusa ready to provide Nautilus with the search functions it needs. The next biggest lot of bug is the indexer, which I hope to close in October.

I've merged my changes into Medusa HEAD. Medusa now runs as a user indexer and search tool. I haven't committed any of my GUI experiments for a GUI finder, and Nautilus integration. Medusa will index a user's home directory, files and file content, and msearch can query the db.

I fixed the make distcheck so tarballs of the working (not HEAD) version can be distributed. I made my changes in a second tree because my tree is very dirty with GUI experiments. I'll need to resolve the conflicts in my gui tree before I can return to my experiments. I'm adding a GUI search tool modeled upon gnome-search-tool. My GUI experiments are for Nautilus, but I'd like to reuse some of the code instead of throwing it away. Providing a desktop tool with Medusa will prove the GUI components are portable, provide an example of how to use them, and make it easier for users/developers to take advantage of Medusa without Nautilus.

The Sutra data model is strongly influenced by Friend of a Friend vocabulary (FOAF), originally proposed by Edd Dumbill. In this draft of Sutra's ontology, the world is made up of two general kinds of objects, Agents and Documents. Noticeably absent from the diagram below are events and a class representing services and software, and they will be addressed in the future. Sutra uses RDF-Schema to define object types, but XML-Schema, and similar mechanism could be used in the future. In most cases, a resource may have many instances of a particular property–a person made (property) many resources, and each will have several topics (properties).

Things that cause change are represented by a resource called an Agent (rdfs:Class). The agent's properties describe it's online identity. The most common form (rdfs:SubClass) of agent is a Person, whose properties may describe real world attributes. Agents may be represented (rdfs:SubClass) as Groups, Projects, or Organizations. Sutra uses Agent because it a crucial resource for establishing relationships between resources. Much of the agent information will be derived from, and interact with, personal information apps like Evolution, and external sources like LDAP and Web sites. See the FOAF documentation for the definition of Agents and their properties.

Documents are things that agents create. Documents are digital artifacts that represent information in Sutra. The properties of the basic Document correspond well with both Nautilus and Web browsers. Nautilus displays document type, thumbnail, icons, and emblems, which compares well to format, thumbnail, logo, and topic respectively. Note that Medusa currently indexes emblems as keywords for searches. Web browsers use content-type, icons, and keywords/classification for bookmarks, which likewise agree with format, thumbnail, logo, and topic.

A Document's properties can be augmented with Posix and DC (Dublin Core) properties from a file system or the Internet. A document may have only one of each property from Posix or DC. There are many subclass of Document that represent basic kinds of data, and their intrinsic properties. Audio files have recording/playback properties, and some like OGG will have ID3 properties about artist and title. Image meta data varies considerably between formats, most have dimension properties, and the EXIF subclass used by digital camera adds additional properties. The Video type is an example of what might be done, more research is needed to understand what technical and creative properties are used. The XML document type only offers encoding and language because little is known about a grammar without a schema. It's feasible to parse XML for known meta data grammars, like Dublin Core. Two common XML (and SGML for many older forms) grammars are HTML (Page) and DocBook. The Office type represents writings, spreadsheets, from office suite software, and PDF/PostScript documents.

In the Sutra data model, a subclasses inherit properties from their super class, so an Office document would have Office properties, plus Document, plus Resource. It may have additional properties from Posix or DC. If an the data model were extended with ad hoc classes or Office was redefined via RDFS to descend from another class, it would gain more properties. For instance, if a Usage class representing frequency, access, and who by were registered as an extension of Resource's properties, Office would acquire them too.

Sutra data model

Sutra is a proposed metadata database. It stores and retrieves data about local and remote resources such as files and people. Some properties are intrinsic to the resource, like image size or music artist. Other properties are external, such as filename or creator. Resource properties are discovered through incident, and attributed in an ad hoc fashion. Properties names and purposes are not rigorously defined, nor required, including intrinsic properties. Sutra requires an extensible data model and a flexible database to accomplish its purpose.

The Storage schema is simple: ChildNode where anonymous, numbered resources are linked, three attribute tables where resources keep their named properties, TextRecords where resource content is stored, and two tables to issue and record resource ids to link everything. This schema has some short comings. Some links between resources are named, but cannot be represent, for instance, the author of resource might be represented by an entry in an address book, and that in turn might link to a homepage. In this scenario the link is both an attribute and a ChildNode, but that cannot be easily represented in the database. Named links could be put in the NumberSoup table, but there is no means to indicate that the number isn't a literal value, but a reference to another ChildNode. The operational data about how attributes work can be stored in the attribute tables, mixing with the content data, but this can lead to conflicts between libstorage, and the content it manages. Additionally, there is no namespace facility to prevent attribute name collision between different kinds of resources. There is no means to define attribute names, their use, and to what they belong. The three attribute tables represent only three simple data types, and cannot manage more complex or refined data types like URIs, or bytes, but that is the very type of data it proposes to store.

The Sutra schema is simple and it addresses Storage's problems by moving and consolidating property information, and adding a table to store type information. The Resource table stores resources by unique id and URI, and resource literal and link properties are stored in the Property table. The Type table defines all properties, and is linked to the Property name column. The Content table is identical to Storage's TextRecord table.

Sutra-Storage schema

The Resource table represent resources in two ways, by a unique numeric id and a unique URI. The uid is used is a foreign key in the Property and Content tables. The URI must be valid, and may be real. Anonymous resources are represent by valid, but meaningless URIs.

The Property table represents the union of Storage's three attribute tables and ChildNode's link responsibility. The resource and type columns are foreign keys representing the Resource and Type tables respectively. The value is a string because it is the most versatile format to represent data types as exemplified in XML-Schema. Most data Storage deals with is string data, so little work will be needed to convert it. In the rare case of numbers and dates, comparison operations can easily and efficiently be accomplished with correct encoding. The isliteral column is a flag indicating that the value is not a foreign key pointing to a resource. The weight column represents a means to order and identify properties that have the same resource and type. It is a mutable id that might be changed by applications to indicate the higher valued properties are more relevant and commonly used. There are scenarios where the value might point to a resource, but the application saving the information chooses to save the literal value. Typing information could be used to declare whether a value is literal or link, but that can lead to problems when there is no resource to link to, or resource table is filled with entities that represent leaf nodes. The property value can be converted from link to literal as needed.

The Type table contains the definition of the types of properties in the Property table. Each type has a unique id. The namespace column is a token indicating to which group the property type belongs. It is synonymous with XML-Namespace and similar to namespace in some programming languages. The property column is the name of the type and used for external representations of the properties. The datatype column indicates the kind of data a property is an how it should stored as defined in XML-Schema data types. The domain column comes from RDF-Scheme where objects can be defined as classes (resources), or properties. Domain is a foreign key linking to another entity in the Type table. In the Sutra model, classes and properties are stored in the same table and used similarly. The super column is a foreign key pointing to the super type of the type. The type table provide a means to define all the kinds of resources and properties in the database. Strong and week queries can be performed by restricting property matching to exactly a property, or to a group constructed from the super relation of properties respectively. Applications can extend and explore the data in the type table to store new kinds of data in the database and query it. The domain is used to distinguish between properties that define major objects (resources) and those that define minor objects (properties). The property concept from RDF-Schema allows properties to be attributed to resources in an ad hoc fashion–classes do not require attribute. A resource acquires properties by context (namespace). For example, an image has intrinsic properties like width and height, plus properties like filename and size because it is stored on a file system.

The Content table contains file data. It represents the body of a resource, when a resource is a digital artifact that has a body, such as an document. Storage does not know the structure of files, so while a file might be a compound document of several kinds of content, it is stored as a single block.

It should be known that in the grand schema of all things Storage, I am the toady to s/master/maintainer/ Seth. Sutra, a meta data database, is my concept to solve the limitations the Medusa faces, and address the storage concerns that Marco brought to my attention regarding bookmarks and browser history. I intend to get Medusa working well as a file indexer/search tool for posix file data. I will replace Medusa parts with Sutra parts to provide richer meta data and search services. Seth will likely replace Sutra parts with Storage parts to provide better content management and search features.

The W3 published a slew or RDF drafts today. Lots to read. I've been reading more about strategies for mapping RDF to RDBMS. Reiteration appears to be too complex to solve. I'm not going to pursue it since better minds than mine have failed. Still, I've got a good idea how to make subject, predicate, object graphs work in a DB using the ideas of RDF Schema and OWL.

Interestingly, National Public Radio (NPR in the USA) had an article on magnetic resonance imaging (MRI) the brain as people solve problems. The network of cells that knows tools is different from the network of cells that knows animals. Thinking about doing something (motion) often leads to the tool network. There is a high cost to get the name of a tool the first time. Each subsequent attempt to recall takes less energy, with evidence that the network is changing its connections to get the answer quickly.

/me thinks each cell keep state of last connections used.

/me thinks each node changes the sequence of index of connection so that the most recent used link is a the top of the stack. subsequent calls are faster because there are fewer steps in locating matching nodes.

/me thinks helper cell must test if four or more neuron connections always occur in a sequence, a connection is made from start to finish to shorten the path.

/me thinks timers and triggers restructure the graph to optimize paths based on proven sequences of connections. Nodes provide feedback to grade the response.

I recall that the brain storing images of objects as images, a circle in a real world matches the circle of neurons in your mind. The brains neurons chase down the connections that lead to the properties of a circle, and the circle itself. That sounds like Sutra and Storage. AI is the Hold Grail of computer science, and has proven to be a graveyard for many programmers. Note to self, get real, keep job.

« Previous PageNext Page »