Legal framework of textual data processing for Machine Translation and Language Technology research and development activities/Public Sector Information (PSI) Case Studies

Case #1: Uploading-copying Public data (normally under PSI directive) to a repository

edit
Case description
Actor Repository manager
Intended use Upload the JRC-Aquis dataset to my repository
Conditions The dataset is accompanied by this "Usage Conditions/Licensing Issues" text at http://ipsc.jrc.ec.europa.eu/?id=198#c2726
Question Do I upload the dataset and also provide a link to the original site

or Just describe it with metadata, add attribution info, and link to the original site for downloading?

Suggested legal solution
Legal position The licensing information is slightly confusing, since it accepts the AC corpus as being in the public domain, but then moves to impose conditions to its access (attribution and non-endorsement). This may be construed as having individual legislative documents as under the PD and the complete database (corpus) under copyright (or the sui-generis right) and having thus the licence applied only to the copyrighted parts of the corpus. It is also debatable whether the corpus would fall under Decision 2011/833/EU. In any case, the licensing terms would satisfy the conditions of the re-use decision. Note that the Eurovoc Thesaurus does not fall under the Re-use decision and a special permission is required regarding its re-use.
Suggested course of action Keep the attribution notice and metadata together with the non-endorsement note. You may upload and share the material through your repository as long as you adhere to these conditions.
Type of Terms and Conditions Attribution, non-endorsement, copyright notices
Legal basis Copyright Law, 2011/833/EU

Case #2: Uploading-copying Public data (normally under PSI directive) to a repository

edit
Case description
Actor Repository manager
Intended use Upload the JRC-Aquis dataset to my repository
Conditions The dataset is accompanied by this "Usage Conditions/Licensing Issues" text at http://ipsc.jrc.ec.europa.eu/?id=198#c2726
Question Do I upload the dataset and also provide a link to the original site

or Just describe it with metadata, add attribution info, and link to the original site for downloading?

Suggested legal solution
Legal position The licensing information is slightly confusing, since it accepts the AC corpus as being in the public domain, but then moves to impose conditions to its access (attribution and non-endorsement). This may be construed as having individual legislative documents as under the PD and the complete database (corpus) under copyright (or the sui-generis right) and having thus the licence applied only to the copyrighted parts of the corpus. It is also debatable whether the corpus would fall under Decision 2011/833/EU. In any case, the licensing terms would satisfy the conditions of the re-use decision. Note that the Eurovoc Thesaurus does not fall under the Re-use decision and a special permission is required regarding its re-use.
Suggested course of action Keep the attribution notice and metadata together with the non-endorsement note. You may upload and share the material through your repository as long as you adhere to these conditions.
Type of Terms and Conditions Attribution, non-endorsement, copyright notices
Legal basis Copyright Law, 2011/833/EU

Case #3: Uploading-copying "Open" data to a repository

edit
Case description
Actor Repository manager
Intended use Upload the Opensubtitles dataset to my repository
Conditions I want to copy it from the http://opus.lingfil.uu.se/OpenSubtitles.php site which says "IMPORTANT: If you use the OpenSubtitle corpus, please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! I got the data under this condition!"
Question Can I upload the dataset and link to the original site? If yes, which is the original site in this case: http://opus.lingfil.uu.se/OpenSubtitles.php OR http://www.opensubtitles.org/en?

Or Just describe it with metadata, add attribution info, and link to the original site for downloading?

Suggested legal solution
Legal position This is a case where we have a copyrighted work (subtitles and subtitles database) that is protected under copyright and is licensed under a custom made open licence. Custom made open licences are licences with minimal conditions (i.e. attribution and copyleft) that were made for a specific work or set of works but could potentially interoperate with other open licences. The specific licence only requires reference to the original site, i.e. http://www.opensubtitles.org/, to allow all uses of the work. No further attribution or use of notices is required, since the attribution through the URL is meant to cover them all.
Suggested course of action Upload and share after including URL (i.e. http://www.opensubtitles.org/) as instructed in the licences
Type of Terms and Conditions Attribution
Legal basis Copyright Law