Appraise & Select

RDM Consultation: Selection Guidelines for Researchers

The intended reuse purpose is crucial for the selection of research data for archiving in a research data center. Possible decision-making approaches for data selection include:

  • For further publications: For this purpose, referenced (=processed) data with additional documentation (metadata) are required.
  • For teaching: For this purpose, samples of original data and compiled data, including analysis steps, are meaningful.
  • For verification of research resultes: Referenced data, including analysis steps for the traceability of all results steps, should be included for this purpose.
  • For further analysis: If possible, all original data should be archived for this purpose.

It is up to researchers and data curators to decide which data are actually relevant for potential reuse. The following checklist can help in determining whether research data are worthy of archiving:

You can also support researchers by providing information and explanation on your website about relevant criteria and processes. For example, the Leibniz Institute for Psychology (ZPID) FDZ offers resources on “How can I submit my research data?” and topics related to data integrity and data selection.

Checklist for Clarifying the Archival Worthiness of Research Data

  • Are there any third-party requirements (e.g., from research funding institutions, data policies, institution guidelines) that make long-term storage necessary?
  • Do the data providers have the necessary usage rights for data sharing? Under what conditions do they “own” the data?
  • Are the collected data unique and not reproducible, or are the costs of reproduction higher than the costs of long-term storage?
  • Would a new data collection likely not yield better results?
  • Is there a high interest in reusing the research data?
  • Have the data not been fully (scientifically) explored yet?
  • Are the data typical or atypical for a research field, or are there unique research results?
  • Do the data have general or regional historical significance?
  • Is the data quality good in terms of technology and content?
  • Are descriptive metadata fully complete or can they be generated?
  • Can the necessary preservation metadata (reference, provenance, context, and persistence information, as well as access rights information) be provided?

[Source: Weber, Andreas and Piesche, Claudia. “4.2 Data Storage, Curation, and Long-Term Availability.” Practical Handbook of Research Data Management, edited by Markus Putnings, Heike Neuroth, and Janna Neumann, Berlin, Boston: De Gruyter Saur, 2021, pp. 327-356.]