The following is an article I wrote for some friends, to help them understand RDF after coming from a relational database point of view. It might be a little inaccurate in places, but the concepts are fairly sound. Feel free to print, post and/or modify: I'm putting this into the public domain. Attribute it to me (aredridel@nbtsc.org) if you like, but that's not a requirement either. Ari ------------------------------------------------------------------------ A quick intro to the RDF data model. Traditionally, the box that everyone shoehorns their data into is the relational database. For an example, consider two tables: restaurants: name (a string, and also our primary key) approximate price (an integer) cuisine (a string, or perhaps an enumerated type) menu: restaurant (referencing restaurants.name as a foreign key) item (string; restaurant+item is the primary key) cost (numeric) It's a 1:N relation, where each restaurant has many items on the menu. Some sample data, in human readable format. restaurants: "Joe's", "$7", "American" "Wing's", "$15", "Chinese" menu: "Joe's", "Sloppy Joe", "$5" "Joe's", "Soup", "$4" "Joe's", "Bottomless Coffee", "$2" "Wing's", "General's Chicken", "$11" "Wing's", "Braised Shrimp", "$12" "Wing's", "Tea", "$1" There are limits to this model. First, the schema is machine readable, but not machine reasonable. As far as any computer is concerned, the data is completely arbitrary. SQL and most relational systems don't let you easily constrain the data in any column, and asking domain-specific questions is out of the scope of the query language entirely. The field names are simple strings, with no meaning to the engine, nor could they be in any portable manner. Second, there are namespace problems: Consider trying to merge another database with a similar field -- say the "price" field in this case is an enumerated type like "Reasonable", "Expensive", "Downright cheap" in one database, and an approximate per-meal figure in the other. You'd have a hard time merging the data. Real world examples are far worse. In practice, one just doesn't merge data that doesn't merge well or perfectly, depending. Third, if one has many, many data items for each item, and most are not known or available, you store "nil" for each undefined field. Storage requirements (and indexing requirements) are much higher for efficiency. For very descriptive data sets, most values could be nil. Alternatives for values require separate fields, or a separate relation with the overhead that brings. Fourth, SQL is not reflexive. Storing data about data and querying it in a relational manner is usually not possible, and certainly not portable: the query "find all tables containing columns that contain numerical data" is impossible in most relational systems, and ugly in the few where it is possible. Fifth, integrating vocabulary (or fields) from vastly different sources is extremely difficult due to the lack of standardization of keys and the lack of reflexivity. One cannot talk about SQL statements in SQL. RDF solves each of these to varying degrees. On the surface, RDF is an abstraction of the traditional SQL relational model into a more mathematically and logically pure rendition. It is also a set of standards for interoperating of databases and knowledge systems with the web as we know it and will know it. RDF organizes the data into "statements", each of which has a subject, a predicate (or transitive verb) and an object. Each of these is represented by a URI (Uniform Resource /Identifier/), or a unique name formatted similarly to a URL, and arranged with a heirarchical namespace, or a literal string value. URIs are a superset of URLs -- they can act as names, locators or both. The same data as shown above might look like this in RDF triples: http://restaurantlist.org/Joes http://www.chefmoz.org/syntax#costs "$7" http://restaurantlist.org/Joes http://www.chefmoz.org/syntax#servesCuisine http://www.chefmoz.org/terms/AmericanFood and so on. For the sake of simplicity, let's define some aliases: foodpred: means "http://www.chefmoz.org/syntax#" foodterms: means "http://www.chefmoz.org/terms/" joesmenu: means "http://www.joes.com/menu/items/" All that's fairly normal XML namespace stuff, which RDF uses as well, with a few more semantics added. It's purely for notational convenience in XML, though far more important in RDF. So now, the full data: http://restaurantlist.org/Joes foodpred:costsAbout "$7". http://restaurantlist.org/Joes foodpred:servesCuisine foodterms:AmericanFood. http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:SloppyJoes. http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:BottomlessCoffee. http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:Soup. joesmenu:Soup foodpred:costs "$4". joesmenu:SloppyJoes foodpred:costs "$5". joesmenu:BottomlessCoffee foodpred:costs "$2". http://restaurantlist.org/Wings foodpred:costsAbout "$12". http://restaurantlist.org/Wings foodpred:servesCuisine foodterms:Chinese. http://restaurantlist.org/Wings foodpred:hasMenuItem http://wings.cn/items#GeneralsChicken. http://restaurantlist.org/Wings foodpred:hasMenuItem http://wings.cn/items#BraisedShrimp. http://restaurantlist.org/Wings foodpred:hasMenuItem http://wings.cn/items#Tea. http://wings.cn/items#Tea foodpred:costs "$1". http://wings.cn/items#GeneralsChicken foodpred:costs "$11". http://wings.cn/items#BraisedShrimp foodpred:costs "$12". http://restaurantlist.org/Joes _:isCalled "Joe's Diner" http://restaurantlist.org/Wings _:isCalled "Wing's Chinese Food" That's a lot to digest, but think of it in English: it's a paragraph that says: "Joes costs about $7 per person; they serve Sloppy Joes, Bottomless Coffee and Soup. The soup costs $4. The sloppy joes cost $5. The coffee costs $2" and so on for the chinese food too. If one merged in another vocabulary, or started using "fields" from it, then one might end up with a more powerful and globally understood data set: http://wings.cn/items#BraisedShrimp rdf:type http://xmlns.com/wordnet/1.6/EthnicFood joesmenu:BottomlessCoffee rdf:type http://wordnet/1.6/terms/HotBeverage And now, any RDF consuming app may be able to make some basic assertions about the restaurants and what they sell, if it's been taught the wordnet vocabulary. The wordnet vocabulary has been exported as RDF, with assertions like: Mutt isKindOf Dog Dog isSubclassOf Canine Canine isSubclassOf Mammal Mammal isSubclassOf Vertebrate Vertebrate isSubclassOf Animal and so on. From the simple assertion that Wing's is a restaurant, and that Braised Shrimp is a dish, it can now be found for anyone looking for an ethnic cuisine -- or anyone looking for superclasses of ethnic cuisine, assuming WordNet actually defines any of those classes. One big clincher here is in the schema language: The schema is RDF as well. It's a series of statements about predicates (the middle column) and the types of the subjects and objects (first and third columns). foodpred:costsAbout rdf:type rdfs:Predicate foodpred:costsAbout rdfs:domain http://xmlns.com/wordnet/1.6/Restaurant Which describes what "costsAbout" means. There are additional predicates one could use to define data types and the like. Now, in the example above, I invented a URI for both Wing's and Joe's. That's not good form, but it's more obvious to explain. All that's needed is a unique ID for that thing -- like the primary key in a relational database. A better thing might be: _:123141 instead of http://restaurantlist.org/Joes and _:654123 instead of http://restaurantlist.org/Wings Which seems to remove a bit of useful info, but it also separates restaurantlist.org from the equation, and who says they're the official URI for Wing's anyway? There may be many or none: http://wings.cn/, though being registered by someone else makes your database dependent on the fact that they don't go global and become http://wings.com/; perhaps this is your personal restaurant list, and being dependant on restaurantlist.org to be "the" place to list restaurants is a little too shaky: in fifty years, will they still be the place on the net to go find food? Or maybe this is Jane's Mom-and-pop diner, and they're not net-saavy yet. This isn't a problem: all one has to do is add another statement: _:654123 hasHomepage http://www.wings.cn/ _:654123 isOwnedBy urn:x-us-ssn:123-56-1234 _:123141 isOwnedBy urn:x-us-ein:13976-45-12 urn:x-us-ein:13976-45-12 hasHomepage http://restaurantsinc.biz Now, one can query for "Wings" by homepage: you can ask "find me the menu for the restaurant who's homepage is http://www.wings.cn". That's not the same as asking "find me the menu for http://wings.cn", since these are URIs, not URLs -- they're just names, not links. A more obvious case is Jane's diner: "find me the menu of the restaurant called "Eat at Jane's" that is located in Tallahassee, FL" -- a good thing to be able to ask, since Jane's has no URL, and so making an intelligent and unique URI that's globally meaningful is not going to happen. These are called "blank nodes" -- blank but unique spots in the information space. You can talk about them, but there's no single handle to get ahold of them by -- just like in the real world. One person knows it as "the restaurant on the corner of main and third", another knows it as "the restaurant jane runs", and another knows it by the name they filed on their liquor license application. There's a query language for all this: (well, several -- one is called RDQL and a subset is Squish, shown here) SELECT ?name, ?price FROM restaurants.nt WHERE (?x rdf::type wn::Restaurant) (?x fp::costsAbout ?price) (?x _::isCalled ?name) (?x fp::hasMenuItem ?c) (?c rdf::type wn:EthnicCuisine) USING rdf for http://www.w3.org/1999/02/22-rdf-syntax-ns# fp for http://www.chefmoz.org/syntax# wn for http://xmlns.com/wordnet/1.6/ _ for <> The results? "Wing's Chinese Food" "$12" I'm in the mood for chinese, so let's go eat at Wing's.