Semantic Web/The Pedantic Web

Unfortunately, there is an air of academia and corporate thinking lingering in the Semantic Web community, which has led to the term "Pedantic Web" being coined, and a lot of mis/disinformation and unnecessary hype being disseminated. Note that this very document was devised to help clear up some common misconceptions that people may have about the Semantic Web.

For example, almost all beginners to RDF go through a sort of "identity crisis" phase, where they confuse people with their names, and documents with their titles. For example, it is common to see statements such as:-

<> dc:creator "Bob" .

However, Bob is just a literal string, so how can a literal string write a document? What the author really means is:-

<> dc:creator _:b . _:b foaf:name "Bob" .

i.e., that was created by someone whose name is "Bob". Tips like these are being slowly collected, and some of them are being displayed in the SWTips guide, a collection of Semantic Web hints and tips maintained as a collaborative development project.

Education And Outreach

The move away from the "Pedantic Web", to some extent, is all part of a movement to bring the power of the Semantic Web to the people. This is a well documented need:-

[...] the idea that the above URIs reveal a schema that somehow fully describes this language and that it is so simple (only two {count 'em 2} possible "statements"), yet looks like the recipe for flying to Mars is a bit daunting. Its very simplicity enables it to evaluate and report on just about anything - from document through language via guidelines! It is a fundamental tool for the Semantic Web in that it gives "power to the people" who can say anything about anything.

- EARL for dummies, William Loughborough, May 2001

RDF Schema and DAML+OIL are generally languages that need to be learned, however, so what is being done to accommodate people who have neither the time nor patience to read up on these things, and yet want to create Semantic Web applications? Thankfully, many Semantic Web applications will be lower end applications, so you'll no more need to have a knowledge of RDF than Amaya requires one to have a knowledge of (X)HTML. Trust and Proof

The next step in the architecture of the Semantic Web is trust and proof. Very little is written about this layer, which is a shame since it will become very important in the future.

In stark reality, the simplest way to put it is: if one person says that x is blue, and another says that x is not blue, doesn't the whole Semantic Web fall apart?

The answer is of course not, because a) applications on the Semantic Web at the moment generally depend upon context, and b) because applications in the future will generally contain proof checking mechanisms, and digital signatures. Context

Applications on the Semantic Web will depend on context generally to let people know whether or not they trust the data. If I get an RDF feed from a friend about some movies that he's seen, and how highly he rates them, I know that I trust that information. Moreover, I can then use that information and safely trust that it came from him, and then leave it down to my own judgement just to how much I trust his critiques of the films that he has reviewed.

Groups of people also operate on shared context. If one group is developing a Semantic Web depiction service, cataloguing who people are, what their names are, and where pictures of those people are, then my trust of that group is dependent upon how much I trust the people running it not to make spurious claims.

So context is a good thing because it lets us operate on local and medium scales intuitively, without having to rely on complex authentication and checking systems. However, what happens when there is a party that we know, but we don't know how to verify that a certain heap of RDF data came from them? That's where digital signatures come in.

In general, there are small- and large-scale systems, and interactions between the two will most likely form a huge part of the transactions that occur on the Semantic Web. Let's define what we mean by large-, medium-, and small-scale systems.

Large Scale

An example of a large-scale system is two companies that are undergoing a merger needing to combine their databases. Another example would be search engines compiling results based upon a huge range of data. Large-scale Semantic Web systems generally involve large databases, and heavy duty inference rules and processors are required to handle the databases. Medium Scale

Medium-scale Semantic Web systems attempt to make sense out of the larger-scale Semantic Web systems, or are examples of small-scale Semantic Web systems joined together. An example of the former is a company trying to partially understand two large-scale invoice formats enough to use them together. An example of the latter is of two address book language groups trying to create a super-address book language.

Small Scale

Small-scale Semantic Web systems are less widely discussed. By small-scale Semantic Web systems, we mean languages that will be used primarily offline, or piles of data that will only be transferred with a limited scope, perhaps between friends, departments, or even two companies.

Sharing data on a local level is a very powerful example of how the Semantic Web can be useful in a myriad of situations. In the next section on evolution we shall be finding out how interactions between the different sized systems will form a key part of the Semantic Web. SEM - SEmantic Memory

The concept of a SEmantic Memory was first proposed by Seth Russell, who suggested that personal database dumps of RDF that one has collected from the "rest" of the Semantic Web (a kind of Semantic Cloud) would be imperative for maintaining a coherent view of data. For example, a SEM would most likely be partitioned into data which is inherent to the whole Semantic Web (i.e., the schemata for the major languages such as XML RDF, RDF Schema, DAML+OIL, and so on), local data which is important for any Semantic Web applications that may be running (e.g. information about the logic namespace for CWM, which is currently built in), and data that the person has personally been using, is publishing, or that has been otherwise entered into the root context of the SEM.

The internal structure of a SEM will most likely go well beyond the usual triples structure of RDF, perhaps as far as quads or even pents. The extra fields are for contexts (an StID), and perhaps sequences. In other words, they are ways of grouping information within the SEM, for easy maintenance and update. For example, it should become simple to just delete any triple that was added into a certain context by removing all triples with that particular StID.

A lot of work on the Semantic Web has concentrated on making data stores (i.e., SEMs) interoperable, which is good, but that has lead to less work being conducted on what actually happens within the SEM itself, which is not good, because the representation of quads and pents in RDF is therefore up in the air. b developers to be investigating at this stage.