ETD Guide/Technical Issues/Identifying: URN, PURL, DOI

< ETD Guide‎ | Technical Issues

Resources distributed on the Internet are accessible by means of a syntax which corresponds to their physical location. This syntax is defined by the RFC 1738 and is known as a Uniform Resource Locator (URL). This way of doing things creates certain problems which we must often confront. Who has not encountered the famous HTTP error 404 Not Found, which indicates that the server cannot find the location of the requested resource? This does not mean that the resource is no longer on the server, because it may simply have been moved to another location. URLs have no means of being automatically updated when a resource is moved to another place, such that we often run up against that famous HTTP error.

While the URL identifies the address of a resource, the Uniform Resource Name (URN) identifies the actual resource, the unit of information, much like the ISBN does for books. To draw a parallel, the URL corresponds to a users’ postal address while the URN corresponds to users’ social insurance or social security number. The URN is thus attached to a resource and not to a physical address. By knowing this identifier, it is possible to find this resource even if its physical address changes. The URN ensures an institutional commitment to the preservation of access to a resource on the Internet.

In the framework of the Université de Montréal’s digital thesis pilot project, undertaken in 1999-2000, we implemented a system for producing URNs based on the model proposed by the CNRI. A global server based at the CNRI manages "Naming authorities" which refer to publisher’s numbers. A local server installed at the thesis distribution station in turn houses a database which manages the associations between URNs and URLs. All of this bears close resemblance to NetworkSolution’s system for managing the DNS which regulate the IP addresses of computers linked to the Internet, except that in our case it is documents being given addresses as opposed to computers.

The model proposed by CNRI is the Handle system. This system is also the cornerstone for the DOI Foundation’s system. The construction of the Handle falls in two parts. The URN’s prefix corresponds to the publisher’s number (the Université de Montréal’s publisher’s number is 1012). This number is unique and cannot be used by any other organization. "Sub-names" can be added following this number in order to subdivide it into more precise units. This sequence is followed by a slash ("/") and a freely chosen alphanumeric sequence. Thus, a Handle-type URN for theses reads as follows:


We chose the year of the thesis defense, the author’s name, his/her date of birth and the format of the file as the constitutive elements of a thesis’ URN identifier. Please note that one must first download the CNRI’s plugin in order to use the Handle system. This system has the advantage of being fairly much in conformity with the requirement of RFC 1737 concerning the framework regulating a URN system. Its application is nevertheless fastidious since one absolutely requires the plugin to be able to resolve the links. After experimentation with the CNRI’s system, the Université de Montréal intends to use another system for our ongoing electronic theses project.

Another interesting avenue is the PURL system created by OCLC. Let us note that a document attached to a PURL can be modified, contrary to other norms or applications for the use of a URN. The PURL system largely follows the same principle as the Handle system except that the URNs are resolved using a URL address. This solution has the advantage of not requiring the use of a plugin. In fact, a PURL is a URL. Rather than pointing directly to an Internet resource, a PURL points to an intermediary resolution service. This service associates the PURL with the active URL, which is then provided to the client. The client then normally gives access to the resource. It is possible to register PURLs with an intermediary service (such as the OCLC’s) or to install the service on one’s own server.



Next Section: Metadata models for ETDs