At our various locations, document servers for electronic theses and dissertations have been set up independently from each other. The aim is to build a network of sites, which would allow for worldwide retrieval within a heterogeneous knowledge base, independently from the physical location of the data provided. The users should not have to navigate and search the various servers separately. They should be provided with one retrieval interface that can link to all the different nodes of the network of ETDs sites. This is the horizontal level on which retrieval could take place.
Another level, which we call the vertical level of the information portal, would be the one, which configures the retrieval interface in a manner allowing the user to retrieve only the relevant and desired information rather than receiving all of the information that can be possibly provided. We wish to avoid the "Altavista effect" of information overload. The user should be able to search within specific subjects and for specific information structures. For example, they should be able to undertake searches just within the author field or the title field, for certain keywords or institutions or just within the abstracts field. A highly sophisticated retrieval facility would allow a worldwide search within certain internal document structures, such as the bibliography.
For the scientific use of theses in the humanities and social sciences, as well as in the natural and technical sciences, it is necessary to offer not only bibliographical metadata and full text but also structural information for retrieval purposes, such as:
- the table of contents;
- captions of tables and graphs;
- special index terms such as name or person indexes or location indexes etc.);
- references (links) to external sources (printed resources as well as Web sources);
- the bibliography;
- references or footnotes within the work;
- mathematical / chemical formulas;
- theses / hypotheses
These structural metadata are an integral part of the document and have to be defined by the author. At present, this predominantly takes place while formatting the text (e.g. headings, footnotes etc.). In order to also use these structural data for retrieval, they must be tagged as such by the author, by using either a structured language like LaTeX, or "style sheets" as with WinWord.
What could be the lowest common denominator for interoperability?
The first step within the above mentioned development is to reach agreement on a common metadata set for theses and dissertations and to formulate guidelines on how to use it for ETD projects. See http://www.ndltd.org/standards/metadata/ for the Dublin Core metadata set proposed by the NDLTD. Those guidelines could be supported by additional free software tools, which would allow library staff to create the necessary metadata set without needing a technical knowledge of the actual encoding in HTML or XML/RDF. Such a "metamaker" has been developed for the German ETD projects and could be translated into English, French, Spanish and Portuguese. MySQL or other free software can be used as the underlying database system.
The Open Archives Specification: A chance for metadata interoperability
During the last 2 years one initiative has effected the discussions about interoperability in digital libraries and the digital lbrary community enormously.
The development of
- a protocol, that can easily be implemented at archive servers and
- a metadata set based upon the Dublin Core metadata set
Allow archives, like ETD servers, preprint archives as well as museums and other institutions to provide their local catalogues to a worldwide community without having to implement specialised and complicated interfaces.
So the Open Archives Framework (see http://www.openarchives.org) allow an interoperability f heterogeneous and distributed ETD archives and servers in a very low interoperability level.
For the ETD initiatives and projects the OAI compliance has to be seen as chance to connect ETD servers worldwide.
Next Section: A vision of the future