Sutra Data Model : Sinzui

The Sutra data model is strongly influenced by Friend of a Friend vocabulary (FOAF), originally proposed by Edd Dumbill. In this draft of Sutra's ontology, the world is made up of two general kinds of objects, Agents and Documents. Noticeably absent from the diagram below are events and a class representing services and software, and they will be addressed in the future. Sutra uses RDF-Schema to define object types, but XML-Schema, and similar mechanism could be used in the future. In most cases, a resource may have many instances of a particular property--a person made (property) many resources, and each will have several topics (properties).

Things that cause change are represented by a resource called an Agent (rdfs:Class). The agent's properties describe it's online identity. The most common form (rdfs:SubClass) of agent is a Person, whose properties may describe real world attributes. Agents may be represented (rdfs:SubClass) as Groups, Projects, or Organizations. Sutra uses Agent because it a crucial resource for establishing relationships between resources. Much of the agent information will be derived from, and interact with, personal information apps like Evolution, and external sources like LDAP and Web sites. See the FOAF documentation for the definition of Agents and their properties.

Documents are things that agents create. Documents are digital artifacts that represent information in Sutra. The properties of the basic Document correspond well with both Nautilus and Web browsers. Nautilus displays document type, thumbnail, icons, and emblems, which compares well to format, thumbnail, logo, and topic respectively. Note that Medusa currently indexes emblems as keywords for searches. Web browsers use content-type, icons, and keywords/classification for bookmarks, which likewise agree with format, thumbnail, logo, and topic.

A Document's properties can be augmented with Posix and DC (Dublin Core) properties from a file system or the Internet. A document may have only one of each property from Posix or DC. There are many subclass of Document that represent basic kinds of data, and their intrinsic properties. Audio files have recording/playback properties, and some like OGG will have ID3 properties about artist and title. Image meta data varies considerably between formats, most have dimension properties, and the EXIF subclass used by digital camera adds additional properties. The Video type is an example of what might be done, more research is needed to understand what technical and creative properties are used. The XML document type only offers encoding and language because little is known about a grammar without a schema. It's feasible to parse XML for known meta data grammars, like Dublin Core. Two common XML (and SGML for many older forms) grammars are HTML (Page) and DocBook. The Office type represents writings, spreadsheets, from office suite software, and PDF/PostScript documents.

In the Sutra data model, a subclasses inherit properties from their super class, so an Office document would have Office properties, plus Document, plus Resource. It may have additional properties from Posix or DC. If an the data model were extended with ad hoc classes or Office was redefined via RDFS to descend from another class, it would gain more properties. For instance, if a Usage class representing frequency, access, and who by were registered as an extension of Resource's properties, Office would acquire them too.

[Sutra data model]