Open Metadata Handbook/Introduction

What is metadata? edit

Definition edit

There are several definitions of metadata:

according to the National Information Standards Organization (NISO), metadata is "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource."
according to the World Wide Web Consortium (W3C), metadata is "machine understandable information for the web."

Metadata is generally understood as "data about data" - data used to describe data or information by providing more contextual information. In library and information sciences, metadata is generally used to collect information about works (books, articles, photographs, etc) regarding their distinctive and contextual characteristics - e.g. a library catalog contains information (metadata) about publications (data).

Data versus Metadata edit

Metadata does not have any other function than describing a particular piece of data or information so as to improve its value and usability. It is sometimes difficult to determine whether something should be regarded as data or metadata. In fact, the same piece of information could sometimes qualify as data or metadata according to the context in which it is used and according to the needs of the user.

Format edit

Metadata statements are typically structured according to defined metadata schema, metadata standards and metadata models. Tools such as controlled vocabularies, taxonomies, thesauri, data dictionaries and metadata registries can be used to apply further standardization to the metadata. Different schema and vocabularies have been implemented to describe different types of resources. These generally come with a different set of rules regarding the manner in which the metadata must be formulated or encoded. While certain metadata schema are syntax independent (i.e. they have no prescribed rules as to how data should be recorded), others require a specific syntax (i.e. metadata has to be recorded in a specific format). The semantics of the metadata schema depends on the vocabularies used, which determine the meaning assumed by different metadata elements.

Why is metadata useful? edit

The purpose of metadata is to attach information to data, so that it can be subsequently discovered and used.

Data stored in library catalogues, archives or museums can refer to various kinds of works: literary works such as books, journals, articles or manuscripts; artistic works such as paintings, drawings, phogragraphs, or maps; musical works recorded in various media; as well as any kind of audiovisual works and multimedia works. These works generally come with no precise information about their inherent properties and legal attributes. Additional information can be provided so as to make the data more useful and valuable to the user. This might include data concerning the type of work, the date of creation and of first publication; information on the author and the content of the work; but also information related to the rights vesting in the work as a whole, or in each one of its constitutive parts. This is what constitute the metadata.

When released in a machine-readable format, metadata enable automated data discovery, as well as the correct use and attribution thereof.

Metadata can be very useful to:

Find information that matches certain criteria.
Better understand the details and characteristics of the information found.
Help other people find and use that information later on.

How is it produced? edit

Producing metadata can be a very challenging task - which is often achieved collaboratively by a variety of actors. Detailed description of records materials is often limited by the amount of information known about each item, which may require significant research to be complete. Structural and administrative metadata is either automatically generated, or provided by the institution responsible for the digitization or the collection of the described resource. Descriptive metadata is generally provided by the institution responsible for the production or the publication of the resource. However, it is sometimes produced by researchers and information professionals whose task is to retrieve the necessary information to produce proper descriptive metadata.

Although it can be costly and time-consuming to produce, metadata adds value to the bibliographic records. The choice of the metadata standard depends both on the costs of implementation and on the expected usage of the data:

Detailed, Flexible & Extensible Implentation

RDF/Sparql provide advanced tools for the description / identification / management of Open Bibliographic Data. A proper RDF database cannot however be done without significant investment of time and cost. Although the need for more precise description of digital resources exists so that they can be searched and identified, for many large-scale digitization projects, this is not realistic.

Simple, Rapid & Low-cost Implementation

Light-weight ad-hoc metadata formats designed for the rapid proliferation of Open Bibliographic Data.

A variety of tools have been developed to support and facilitate the tasks of creating and editing metadata, with either free tools or commercial software. These include, for instance:

Templates allowing users to introduce values within predefined fields for a particular element set. A properly formatted set of elements attributes and values will then be automatically generated by the template system.
Mark-up tools allowing users to structure metadata attributes and value into a particular metadata schema - e.g. XML or SGML Document Type Definitions.
Extraction tools allowing users to automatically generate metadata records from a digital resources (usually of a literary nature). The more sophisticated are these tools, the higher will be quality of the metadata generated - although the resulting metadata should always be manually reviewed for the sake of certainty.
Conversion tools allowing users to translate metadata records from one format to another. Again, although these tools generally produce accurate results, the resulting metadata should always be manually reviewed.

Open Metadata Registry edit

The Open Metadata Registry is a site that allows you to create RDF data sets and vocabularies in a simple user interface. It is the home to the RDA elements in RDF, as well as a number of IFLA sets, such as FRBR and ISBD.

You can experiment with creating vocabularies and metadata elements in the Open Metadata Registry Sandbox. You will need to set up a logon id and password. After that you will see the "(add)" link beside "Vocabularies" and "Elements" on the upper right. Feel free to look at what others have done and to create your own metadata. Once you have filled in the information for an element or term and saved it, you will then be able to see the result in RDF by clicking on the link on the bottom right.

How is it used? edit

Metadata is employed by many libraries to catalog resources such as books, periodicals, DVDs, web pages or digital images. Metadata is stored in the integrated library management system (ILMS) using a particular metadata standard. The purpose is to direct people to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question.

In the bibliographic context, metadata can be used for the purposes of:

Identifying resources (Item and Collections) edit

Metadata can be used to identify bibliographic resources (either items or collections of items). On the Internet, this is often achieved by means of unique identifiers - such as ISBN/ISSN, DOI (Digital Object Identifier), PURL (Persistent URL) or standard URL (Uniform Resource Locator). Metadata can also be used to retrieve the information concerning a bibliographic resource, given its identification, or vice versa, to retrieve the identifier of a resource, given a specific set of identifying criteria.

Resource Discovery edit

Given the large amount of information available nowadays, it is increasingly necessary to facilitate the discovery of particular resources according to specific criteria. Metadata can be extremely useful in this context, in so far as it:

allows for a better identification of resources.
allows for resources to be searched according to specific keywords or criteria.
facilitates the identification of similarities / dissimilarities between different resources. This facilitates the collection / aggregation of resources sharing similar criteria.
enhances the quality of automated searches, as search engines can better understand the context, details and contents of different resources.

Cataloguing edit

The aggregation of different ressources according to specific criteria is particularly useful for the purpose of organisation and classification. Thanks to metadata, different collection of ressources can be created, dynamically, according to the audience or the purpose for which those ressources are being retrieved. In this context, metadata can be useful for:

describing individual resources: documents, pages, images, audio files, etc.
describing the content of collections: Web sites, databases, directories, etc.
describing relationships among resources: Tables of Content, chapters, images - Site Maps

Archiving and Preservation edit

With the advent of digital technologies, the question of preservation is becoming a growing concern. Digital resources are more fragile than physical resources, in that they can be easily lost or corrupted (whether intentionally or not). Digital media can also be damaged or their technology (software or hardware) might become obsolete and therefore no longer usable. Metadata can ensure that resources will remain available in the future by ensure that one copy is always accessible. Metadata can also keep track of the history of a digital resources: where it comes from, the changes it has gone through, etc. Several metadata schema have already been developed to facilitate digital preservation of bibliographic resources. See e.g. the initiatives of the National Library of Australia, the British Cedars Project (CURL), the OCLC Working Group and the Research Libraries Group. See, in particular, the PREMIS initiative (PREservation Metadata: Implementation Strategies) endorsed by the OCLC and the Research Libraries Group. Most of these initiatives are compatible with the OAIS standard (ISO Reference Model for an Open Archival Information System).

Interoperability edit

Describing ressources with a proper metadata format allows for the description to be understandable by both humans and machines.
Defined metadata schemas enables the exchange of information between different systems with only a minimal loss of information - by means of shared transfer protocols or cross-walking between different schemas.
The use of standard metadata schemas enables users to search for specific ressources through several database using similar or interoperable formats.

More recent and specialized instances of library metadata include the establishment of digital libraries including e-print repositories and digital image libraries. Given the custom nature of included materials, metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords or copyright statement. Standard file information such as file size and format are usually automatically included.

Standardization for library operation has been a key topic in international standardization (ISO) for decades. Standards for metadata in digital libraries include Dublin Core, METS, MODS, DDI, ISO standard Digital Object Identifier (DOI), ISO standard Uniform Resource Name (URN), PREMIS schema, Ecological Metadata Language, and OAI-PMH.