Crowdsourcing/In practice/Division of labour for a scholarly database

Improving a highly specialised scientific database might seem the sort of task that needs to be kept to experts rather than opened up to the general public. In fact, it fits the earlier description of tasks that benefit from crowdsourcing. It can be broken down into many small, independent steps, each of which can be checked individually, and at least some of the necessary skills are widely available. A crowdsourcing project does not have to leave everything to the public but can invite them to build on something created by professionals.

Data about a protein family on Wikipedia

The Pfam protein families database, used extensively by biochemists, has benefitted from a bi-directional linking with Wikipedia. Encyclopedia articles have been populated with information from Pfam and corrections or expansions can be fed back into it. This puts the Pfam data where it will be very easily found and used by researchers, students and other users, with links to the official site for verification. It also puts any errors where they can be seen and acted upon. The Pfam maintainers realise that minor errors can occur in a large database, but they use that as a reason to make the data as visible as possible rather than as an argument against publication.

Changes to the Wikipedia articles are not copied immediately into the database, but are collected by a script and manually checked. The Pfam maintainers write:

Most edits are simple format or typographic improvements, but many have also provided valuable scientific content, including significant improvements to and expansion of important articles.[1]

With a more controversial topic (such as current politicians) the proportion of vandal edits would be much higher, but protein families are stable enough that anyone editing an article is at least trying to improve it.

So there is a role for the public to play in maintaining a scholarly database. Since “the public” includes highly qualified people, this role is not restricted to low-value contributions. Collaboration between the professional Pfam community and interested members of the public would be impossible to arrange prospectively, but thanks to wiki technology and shared data they can work in parallel.


  1. Finn, R. D. et al. (1 January 2014) "Pfam: the protein families database" Nucleic Acids Research volume 42 issue D1 DOI 10.1093/nar/gkt1223
Previous Index Next