WG 2 – Data Collation

WG 2 – Data Collation


Maria Tsiafouli 
Department of Ecology, School of Biology 
Aristotle University of Thessaloniki, Greece 

Jérôme Cortet 
Centre d’Ecologie Fonctionnelle et Evolutive 
Université Paul-Valéry Montpellier 3, France 


Taxonomists and ecologists in individual European countries as well as several EU projects have been collecting soil biodiversity data for decades. This represents a vast pool of soil biodiversity data throughout Europe, much of which is publically unavailable (“file-drawer” problem). A key goal of WG2 is to identify these data sources as well as to establish the procedures and tools necessary for collecting and integrating past, present and future data in a pan-European data warehouse.

  • A major task lies in the identification of existing datasets throughout Europe, their availability as well as degree of digitalisation and formats. Datasets will be evaluated for compliance to vocabularies, available data fields, formats etc. set in WG1. The collection of descriptive and environmental metadata of datasets will be evaluated and ensured. Guidelines for rendering these datasets compatible to a common data warehouse will be developed. 
  • A principle task is to develop methodologies for building capacity for digitizing data only available on paper. Training courses and workshops will offer assistance in building digitalisation capacity and use of data-upload software tools. Larger datasets available in digital form will be exemplarily collated by partners having access to them through their networks. These will represent the basis for testing and improving data-upload tools, quality-control procedures (WG1) as well as data evaluation and assessment routines (WG6).

Preparing and importing data to external databases is beyond the range of people’s daily work tasks. Software tools for data upload are currently being developed and tested to ease this task. These tools allow data in various digital formats to be entered to the platform foreseen by the Action. 

  • WG2 will test this software for the exemplary data with WG1, evaluate its efficiency and deliver guidance for its further development. During further development of the upload software, stress will therefore be laid on its uncomplicated use to ease time demands placed on data upload. 
  • For larger databases allowing a continual future data-upload, Darwin Core data-exchange formats (wrapper software) already in place will be evaluated and further developed. 
  • For these software tools, WG2 will also work in close cooperation with WG4 in improving user interfaces. 

A data policy (incl. data-sharing agreement) has been developed prior to the Action. It sets terms-of-use for the data warehouse, describes how data from providers is maintained and curated, and defines an intellectual property rights (IPR) policy offering data providers hierarchical public data-accessibility possibilities for their data, including embargo periods for unpublished data. It has been legally checked and complies with recent EU data-protection legislation. 

  • WG2 will evaluate this policy, especially considering EU legal developments regarding data protection and information security, compare it with data policies of other biodiversity databases and provide guidance for its amendment and expansion. 
  • During dataset identification, WG2 will assess this data policy with data providers to assure that their IPRs are adequately met. 
  • Recommendations for confirming and securing authorship for data providers whose data is utilised by users (i.e., via DOIs) will be given. 

A final data policy will be legally checked for compliance with (inter-)national data-protection legislation.

Intended Outcomes

(1) A catalogue of available datasets across Europe, including (2) guidelines and requirements for harmonizing and rendering these datasets compatible to a pan-European data warehouse. (3) Tested software tools for comfortable data import. (4) A formal, legally checked data policy protecting data providers’ IPRs.