Sandboxing an RfC before we end up making it live so that I make sure I got formatting etc right :)



Proposal to implement metadata on Wikibooks edit

Hi all -

I’m Max Klein. For the last few months, I’ve been working with Yaron Koren to develop HTML Tags, a new extension for Mediawiki that can support the use of LRMI (Learning Resource Metadata Initiative), a metadata framework for educational resources and other metadata schemas built on schema.org’s in general. Metadata frameworks like LRMI are not supported in a default installation of Mediawiki because they use HTML in a way that was not yet envisioned when Mediawiki's core parser was written.

Disclosure: our work on HTML Tags has been financed by Creative Commons.

Why use metadata? edit

We think that using metadata has a lot of potential benefits to a project like Wikibooks. The addition of metadata to Wikibooks should have significant effects on the accessibility of Wikibooks’ resources via search engines. Schema.org was formed collaboratively by Bing, Google, Yahoo!, and Yandex, with the explicitly stated aim of making it easier for their users to turn up high quality relevant results in their searches. Besides the direct effect on search placement, most major search engines also use metadata like LRMI and Schema.org to help generate rich snippets, which improve the preview of a website that the user of a search engine is shown. Whereas improving access to information is a primary goal of Wikibooks and the Wikimedia movement, we believe improved search result quality by itself is a compelling reason to implement HTML Tags and LRMI.

We also anticipate that there may be unexpectedly creative uses of HTML Tags in contexts other than pure metadata. Although we cannot guarantee we’ll be able to provide technical support for all such uses, we are certainly excited about them. We’ll support cool side projects where we can, and if something comes up that we can’t support we’ll try to connect you with volunteers with the appropriate skill-sets to progress your project.

Readiness and end user experience edit

The HTML Tags extension is fully developed and has been tested on other Mediawiki installations in the wild. We have also developed a preliminary set of templates to use in conjunction with it. The extension along with the templates handle all of the behind-the-scenes stuff so all the user has to do to add metadata to a page is fill out the template. We think that makes it as easy as possible to add valid LRMI and schema.org markup to any wiki page - it is no harder than using any other template. We’ve set up a demo wiki on Referata, please feel free to play around with it.

How it works

  • The MediaWiki templates pass LRMI properties through the templates' parameters, with the conversion to valid metadata markup handled by subsequently called templates and the HTML Tags MediaWiki extension. The LRMI templates are currently set up to use the terms mentioned in LRMI’s specification (which you can find here,) including the schema.org terms that it mentions. To add metadata to a page, you can either directly add the {{LRMI-object}} template or use the LRMI button we developed to the default edit window (which is found three buttons to the right of the button used to italicize text on the demo) Here’s an example of what the LRMI-object template can look like:

Example template

{{LRMI-object
|itemprop=educationalAlignment
|educationalAlignment=Independent study
|intendedEndUserRole=Student
|educationalUse=Reading
|timeRequired=P30M
|typicalAgeRange=0-12
|interactivityType=non-interactive
|learningResourceType=Wikibook
|useRightsUrl=http://creativecommons.org/licenses/by/3.0/
|isBasedOnUrl=http://en.wikibooks.org/wiki/Wikijunior:Biology/Introduction
|name={{PAGENAME}}
|About=
|dateCreated=
|author=Wikibooks contributors
|publisher=Wikimedia Foundation
|inLanguage=English
|mediaType=Wiki article
}}


Each of the parameters used in this template has a one-to-one correspondence with the terms used in LRMI.org’s specification. For the sake of simplicity, we have used the exact terms that appear in the specification. If a situation arises where it would be helpful to depart from the exact terms used in LRMI’s specification, it would be possible to do so without breaking anything as long as a one to one correspondence is maintained between the parameters in the template and the LRMI and schema.org specifications. Although we have only included terms from the LRMI specification in the template currently, it would be trivially easy to add any other terms that appear in schema.org’s specification if they would be useful on Wikibooks. When we have finalized the terms that will appear in the template, we’ll copy over descriptions of each term into the on-wiki documentation for the template. You can see a demonstration of the use of this template at this page on our demo wiki (which contains part of Wikijunior:Biology/Introduction, copied over with metadata added to it.) The addition of metadata doesn’t change what is displayed to the average viewer - you can only see the metadata in the editing window, or if you run the page through Google’s rich text snippet tool.

Under the hood

  • The Template {{LRMI-object}} calls two templates under the hood. {{LRMI-span}} and {{LRMI-meta}} are used in turn to call HTML‘s ''<span>'' and ''<meta>'' tags which is where schema.org and LRMI metadata is specified. The last template, {{LRMI}} can be used to tag pages with single metadata properties. You can see a demonstration of how it works on this page of the demo wiki. We don’t anticipate that this will be used on Wikibooks anywhere nearly as frequently as {{LRMI-object}} which bundles all the LRMI properties into one template.

We think that this set of templates, coupled with HTML Tags, would pose a lot of advantages for Wikibooks. We’ve written a brief FAQ-style document and transcluded it below - hopefully, it’ll answer more questions that some of you may have, without just being a giant wall of text. If you have questions that aren’t answered in the overview or the FAQ, please feel free to ask them :)

FAQ

Who are we?

We are Max Klein of UntrikiWiki.com and Yaron Koren of WikiWorks.com. Max has worked as Wikipedian-in-Residence for OCLC Research where he created VIAFbot. Yaron, a longtime MediaWiki developer, administrator and consultant has recently published a manual on MediaWiki Working with MediaWiki.

The development of HTML Tags is funded by a grant through Creative Commons, the Mountain View-based open tech nonprofit. Creative Commons is responsible for the creation and maintenance of the Creative Commons licenses that Wikibooks (and other Wikimedia projects) use, as well as a bunch of other fun stuff. This project doesn’t have a commercial purpose - Creative Commons' mission (which we share) is to develop and support the technical infrastructure necessary to maximize digital creativity, sharing, and innovation.

What is metadata?

Descriptive metadata is structured information that describes the content that it is associated with - it’s information about information. It can include stuff like how long content is expected to take to consume, what the topic of the content is, who the content is aimed at, and other similar information.

What is LRMI?

LRMI is a joint project that is co-led by Creative Commons and the Association of Educational Publishers (the only professional organization that covers the entire educational resources community.) The standard was developed in an open and collaborative process that made active efforts to try to involve all major stakeholders as well as the general public. The advisory group for the initiative had members from Scholastic, Pearson, Houghton Mifflin Harcourt, Curriki, and McGraw Hill, among others. The technical working group that worked on the project involved many people with relevant expertise, including the head of the Dublin Core Metadata Initiative, and people from Creative Commons, Microsoft, and the Gates Foundation as well as Wikipedian (and UCB professor) Brian Carver. (You can see a full list of members of both the advisory group and the technical working group here.)

What is Schema.org?

Schema.org is a joint project between Google, Bing, Yahoo, and Yandex - four of the world’s biggest search engines that collectively account for more than 96% of all web searches worldwide - that aims to develop a collection of metadata schemas that can be used by webmasters to provide search engines with extra information about their content so that search engines can improve the quality of their results.

Why LRMI and Schema.org over competing standards?

There are competing metadata standards, but to be useful, metadata standards must be supported by tools. We believe that the advantage inherent to being supported by four of the world’s biggest search engines means that, inevitably, schema.org will win out over its competitors.

If we are wrong and another metadata standard eventually supplants LRMI/schema.org, from a technical standpoint, HTML Tags will be able to support other metadata standards with a relatively small amount of modification. Where there are one to one parameter equivalencies, a bot could be employed to convert pre-existing LRMI/schema.org parameters automatically.

Why should we use metadata/HTML Tags?

Adding metadata to Wikibooks would have a couple very beneficial effects in the near-term future. The biggest of these would be increasing the accessibility of Wikibooks’ resources in most major search engines - Schema.org was formed collaboratively by Bing, Google, Yahoo!, and Yandex, with the explicitly stated aim of making it easier for their users to turn up high quality relevant results in their searches. The addition of good metadata to Wikibooks content should improve its accessibility via search engines. It will also improve the preview of Wikibooks content that is shown in search results, since most major search engines use metadata markup to generate better previews where it is available.

As an example of how this could play out, take a look at the Wikijunior book about the solar system - specifically, it’s chapter about the sun. It presents a pretty good overview of the Sun, aimed at elementary school students. Despite the fact that it’s a pretty solid chapter, it doesn’t get very much traffic - only about 220 views a month. This seems like way too little exposure for such a high quality book. After playing with Google for quite a bit, it became obvious that this chapter was not ranked highly on most relevant keywords - the addition of accurate metadata to this book should improve its accessibility via relevant keywords. (You can’t add metadata to low quality pages and expect a meteoric search engine boost, but when you add metadata to pages that already have high quality content, the results can be remarkable.)

We also anticipate that there may be unexpectedly creative uses of HTML Tags to support things other than metadata. Although we cannot guarantee we’ll be able to provide technical support for all such uses, we are certainly excited about them. We’ll support cool side projects where we can, and if something comes up that we can’t support we’ll try to connect you with volunteers with the appropriate skill-sets to progress your project.

What would adopting HTML Tags and LRMI involve?
Does HTML Tags pose a security risk?

No.

The set of tags and tag attributes (which is the set of tags necessary to properly support LRMI and schema.org’s metadata schema) enabled in HTML Tags by default pose no security risk. The allowable set of tags and tag attributes is set in localsettings.php, and can only be modified by Wikimedia Foundation staff and trusted volunteer developers. Additional tags and tag attributes could be enabled in the future (via Bugzilla request) if they end up being desired at a later date. (Additional metadata terms could be enabled via editing the existing template structure without needing to go to Bugzilla.)

How much of a workload would this add to the community?

Adding metadata to Wikibooks would be a gradual process - no one is going to need to go through 50,000 content-space pages and categorize them all at once. Ideally, editors would slowly begin to add metadata to whatever pages they edit. There would be a couple ways to go about doing this.

The first would be to add a template to the page by hand, which would in most situations be {{LRMI-object}}. {{LRMI-object}} is similar in structure to the {{Authority control}} template on the English Wikipedia. Some more information about the templates is found earlier in this document. The second way to add metadata to a page would be by using the LRMI button that we’ve added to the editing toolbar in our demo wiki (which can be found in the editing window three buttons to the right of the button used to italicize text on the demo.)

After the template parameters have been set, the template will, behind the scenes, render the basic template structure into the non-user friendly HTML required to meet LRMI and schema.org’s exact specifications. Once a sizable portion of Wikibooks’ content has been tagged with metadata, there should begin to be noticeable improvements to things like search engine result quality without any additional effort on the part of editors. This improvement is likely to become even more drastic once more websites broadly implement metadata schemes.

What if two editors disagree about the appropriateness of a particular tagging?

We don’t envision a special or standalone system for dealing with disagreements of this sort. Hopefully they come up only rarely, but when they come up, it should be possible to handle them using Wikibooks’ existing dispute resolution system. We anticipate that most tagging should be fairly straightforward and that disputes should be uncommon.

Are there any licensing incompatibilities between LRMI/Schema.org/HTML Tags and Wikibooks?

No.

Schema.org is licensed under CC-BY-SA 3.0, the same license that Wikibooks’ content is normally released under. LRMI’s vocabulary is also released under CC-BY-SA. HTML Tags is released under the GPL, which is a free license (and the license that Mediawiki extensions are typically released under.) All of these licenses are 100% compatible with being used on Wikibooks.

The two formal proposals edit

We would like to try to launch a discussion aimed at achieving consensus on two points:

  1. First, that Wikibooks should install the Mediawiki extension HTML Tags.
  2. Second, that the addition of LRMI and schema.org metadata to Wikibooks should be encouraged and the template library that we’ve developed at the demo should be migrated to Wikibooks.

Once consensus has been established on the first point, the extension will be installed via a Bugzilla request - since it requires shell access. The templates can be transferred and edited to adapt to Wikibooks’ needs at any point as necessary by any editor, since shell access is not needed to do so.

In order to keep the amount of displayed text on this page manageable, we have transcluded most of the material displayed here from this page and this page. I’ll drop a note here if we make any edits to the main body of the RfC. We may make edits to (and expand) the FAQ as this thread progresses to ensure that all commonly asked questions are answered in our original post. We are using this format to ensure that all of the collapsed content didn’t clutter up this page too much. Thanks, Maximilian.Klein.LRMI (discusscontribs) 01:47, 10 January 2013 (UTC)