User:Pfctdayelise/Using Wikipedia as a resource for computational linguistics
The aim of this book is to outline some areas of research in computational linguistics or natural language processing where Wikipedia, and by extension the other Wikimedia projects, have the potential to be valuable resources. It is not intended to serve as an introduction to either of these fields and does not assume any knowledge of Wikipedia.
Description of Wikimedia projects
editWitkionary, Wikinews, Commons, Wikibooks, Wikisource, Wikiquote. Languages. Meta & Commons direct translations of help (etc) pages. Who contributes? growth.
Database dumps
editPost(pre?)-processing tools
editDescription of the English Wikipedia
edit- License
- Accessible - dumps
- Coverage - biased to pop-culture and geek topics (best coverage), wikiprojects
- Format - MOS - but not reliable
- FAs, cleanup tags
- RDRs
Interwiki links
editCategories
editDisambiguation pages
editPossible tasks
editWord sense disambiguation
editWord and phrase translation
editWeb mining, data mining
editMachine translation
editGeospatial term disambiguation and named entity recognition
editImage analysis (?)
editSynonymy, abbreviations (RDRs)
edit
- http://wm.sieheauch.de/?p=48 papers
- Wiki Research Bibliography
- Michael Strube
- "Web corpus mining by instance of Wikipedia"
- Video mining