If you haven't already, skim through the features page to get a sense of the overall goals the technical infrastructure is trying to accomplish. This page dives right in and doesn't really cover what storage is / does.

Storage is comprised of 7 major components:

  1. libstorage - provides StorageItem, StorageQuery, and StorageStore (eventually, currently this stuff is folded into StorageItem as global functions) GObjects. These provide the primary ways applications interact with Storage. Applications can query a Store for Items matching a particular set of criteria, or they can get an Item by its Store-unique ID number, or its global URI (storage://HOSTNAME/ID). Items support a few basic operations: you can get/set typed attributes (e.g. myitem.get_int("attribute_name")), you can get/set children of the Item, and in the future you will be able to (even strongly encouraged to...) establish callbacks for when the Item's attributes or children change.
  2. libstorage-translators - translators provide a bridge between Storage and flat-file systems. Translators import and export Items and trees of Items into and out of the Store as streams, producing traditional file formats. For example, the XML importer (which is very straightforward since Items are very similar to XML's nodes) allows XML files to be decomposed into StorageItems that are placed in a Store, and allows appropriately marked trees of StorageItems to be exported as XML. While the long term goal of translators is to provide interoperability with other computers (Windows, Macintosh, etc) using traditional filesystems, translators work in realtime on streams and combined with VFS allow "legacy" GnomeVFS applications to access data in the store as usual.
  3. GnomeVFS module - the GnomeVFS module, as briefly described above, bridges the gap between Stores and existing GNOME applications. The URI format for the module was carefully chosen to allow Applications to perform many of the Store's range of features, including queries. For example storage:///STORAGE_QUERY_HERE/ produces a directory listing of FILEID.desktop files (with nice filenames) pointing to Items with MIME type attributes (i.e. those that constitute the top of a file) matching a particular query. When an application tries to open and read storage:///STORAGE_QUERY_HERE/FILEID.desktop, the VFS module doesn't have to re-run the query, but simply uses libstorage to get the title of the Item with FILEID, and return a .desktop file contents with appropriate fields. The desktop files then point to URIs of the format storage:///FILEID/File's Name.extension. Storage actually doesn't care about the "File's Name.extension" part, storage:///FILEID/* resolve to the same file, but a filename is printed here so that Applications have a nice filename to put in their Window Titlebars, properties dialogues, etc. In short the VFS module's "virtual structure" is designed to provide lots of features and compatibility with existing apps.
  4. PET - PET is an extremely fast LGPL'd Head-Phrase Structure Grammar (HPSG) parser written by Ulrich Callmeier. Combined with a grammar (such as the free English Resource Grammar) it can take a human-language string and produce a feature structure representing a derivation of the string based on the rules defined in the HPSG. Included in this feature structure, at least when produced by grammar's following the LinGO grammar matrix, is a Minimal Recursion Semantics (MRS) representation of the phrase. Speaking very loosely, MRS provides a set of scoped predicates and universal quantifiers (e.g. its something like every(x, white(x) AND horse(x), old(x)), except that MRS has some more complicated ideas like handles that are necessary to preserve inherent ambiguity, the arguments are of varying types, and there is some element of Davidsonian event semantics to consider).
  5. nl-parser - nl-parser's job is to take a string, run it through the rather complicated PET API, extract the resulting information in the form of Minimal Recursion "Semantics" and convert them into something that we can meaningfully process (imbdue them with real semantic value). Currently nl-parser uses a lambda calculus (defined in a sort of "semantic grammar" contained within nl-parser, currently only about 30 entries long) in the spirit of Heim & Kratzer and Partee to transform the tree of predicates in the MRS into a set of attribute constraints expressed in some form of relational algebra. This set of constraints can then be transformed into a Storage compatible SQL query (or eventually a StorageQuery object) and produce a set of objects matching the incoming string. While the particulars of the system built up with lambda calculus are fairly sophisticated and flexible, based on a great deal of work by "giants" on whose shoulders we stand, many of the people behind HPSG and MRS think that avoiding lambda calculus is a major benefit of MRS. I haven't figured out exactly how to process the MRS "more directly", using its ideas of quantifiers rather than neutering the scopal and handle information...but its something I'm working on.
  6. "Open" applet - as seen in the screenshots, the applet runs user-typed strings through nl-parser, retrieves the matching Items through libstorage, and displays them nicely for the user. The applet is carefully threaded to allow queries to be run and displayed as the user is typing, allowing rapid iterative refinement (parsing + fetching results takes about a quarter second). Eventually the applet needs to provide feedback on the "most constraining" or unintelligable part of the expression, providing feedback to the user when the computer doesn't understand part of a natural phrase. Also planned is a category refinement system that will allow exploration of data available in addition to the object reference provided by association. The applet also allows searches to be dragged out as virtual folders, allowing expert optimization of common "queries".
  7. SQL database - finally the part I dread talking about because its what everyone "latches onto". Yes, Storage is database backed, in our case primarily by Postgresql though care has been taken to use SQL99 in the hopes of being able to use other SQL servers such as (notably) Oracle. Currently we rely on one Postgres specific feature, LISTEN/NOTIFY, in order to provide notification callbacks when Items are changed by other processes. Oracle provides comparable features, but AFAIK no standard exists for this feature so we'd have to hard-code database by database support. Information is currently stored in the database in the form of "attribute | value" pairs in a sort of information soup. While we were initially skeptical that this would provide the desired performance, tests on large sets of data have only been promising. We are very pleased with Postgres, though the way SQL does typing makes certain Storage features tricky. Ideally it would be nice to use a variant data type, but instead we have had to provide multiple "AttributeSoup" tables, each containing a different type of value (integer, float, string, date, etc).