Legal framework of textual data processing for Machine Translation and Language Technology research and development activities/MT-MP:Lexicon Distribution Case Studies

Case #16 Lexicon Distribution (based on other datasets) I

edit
Case description
Actor Researcher-Resource Compiler & Provider
Intended use Distribute a lexicon I have created on the basis of web crawled data (lemma selection, lemma frequency)
Conditions I have no (or little) information on the legal info of the web crawled texts, since I only used the dataset to extract specific information from it
Question Can I distribute it under the licence of my choice?

Or Do I check again all the sources and ask for permission, which can be extremely time-consuming Or refrain from distributing it?

Suggested legal solution
Legal position See Cases #8 – 10
Suggested course of action Key suggestions: (a) the closer to the original the greater the risk (b) if the original can be substituted by what I have produced, then I should not distribute my work (c) if the original cannot be recognised or reconstructed the risk is too low (d) ideally I should not be using it for commercial purposes (e) if the risk is high include a notice and take-down notice
Type of Terms and Conditions Multiple. NC element suggested.
Legal basis Copyright Law. Emphasis on limitations and exceptions.

Case #17 Lexicon Distribution (based on other datasets) II

edit
Case description
Actor Researcher-Resource Compiler & Provider
Intended use Distribute a lexicon I have created including web crawled data used in the form of examples
Conditions I have no (or little) information on the legal info of the web crawled texts, since I only used the dataset to extract specific information from it
Question Can I distribute it under the licence of my choice?

Or Do I check again all the sources and ask for permission, which can be extremely time-consuming Or Refrain from distributing it?

Suggested legal solution
Legal position See Case #16.
Suggested course of action Same as Case #16. The inclusion of my own or cleared content further reduces the risks of infringement.
Type of Terms and Conditions Multiple. NC element suggested.
Legal basis Copyright Law. Emphasis on limitations and exceptions.

Case #18 Lexicon Distribution (based on other datasets) III

edit
Case description
Actor Researcher-Resource Compiler & Provider
Intended use Distribute a lexicon I have created which includes definitions (paraphrased but also verbatim) from other lexica (one or more)
Conditions The licence of the original lexica is not always clear (e.g. there is nothing in the Dictionary of modern Greek, while the web site for the Triantafylides dictionary includes the following terms of use: http://www.greek-language.gr/greekLang/terms/index.html)


Question Can I distribute it under the licence of my choice?

Or Do I ask for permission from sources? Or Simply state the sources (attribution-like) Or Refrain from distributing it?

Suggested legal solution
Legal position Inclusion of definitions from another lexicon will most probably constitute extraction of substantial parts from another database, even if these are few. The reasons is that to extract them they are significant and thus by definition substantial. For these, permission should be sought. Apparently, the fewer the definitions, the lower the risk. If the definitions are paraphrased and there could not be a way to define a term differently and the structure of the lexicon is not copied, then there should be no problem, since facts and ideas are not copyright protected. The prohibition of paraphrasing in the terms and conditions of a web-site is only valid to the extent that your paraphrasing has copied original elements in the expression of the original definition. Otherwise, ideas are not protected.
Suggested course of action Paraphrase where there is not other way to say something. Avoid copying any original elements. Do not do excessive verbatim copying.
Type of Terms and Conditions Multiple.
Legal basis Copyright Law. Emphasis on limitations and exceptions.