Appraise & Select

After research data have been received at your RDC, they are subsequently reviewed on the basis of different criteria. These criteria and the granularity of the review depend on FDZ-internal agreements.

In this case, it is helpful to have a checklist in which all details of the verification can be listed and processed in a structured manner. A distinction is made between formal and content-related aspects of the incoming data check. If relevant information is missing or documents are not submitted in full, the data provider may have to make corrections.

Example: SowiDataNet checklist for data ingest control of incoming research data  📝 

A Technical quality check

1) Readability of files:

Answer options

Can all files (data and documentation) be rendered?

Yes / No

Are all files in preferred formats?

Yes / No / Not checked

If not, which files are not in preferred formats?

Free text

In which formats are the files and which software is required to render the files?

Free text

Were all files checked for viruses and other malware?

Yes / No / Not checked

Further comments on the readability of files:

Free text

B Legal quality check

1) Intellectual Property Rights, licenses, usage rights:

Answer options

Does the data depositor and/or your institute have all the necessary Intellectual Property Rights and/or usage rights to publish the data?

Yes / No

Was secondary (i.e. re-used) data submitted??

Yes / No / Not checked

If yes: Has the publication of the data been authorized by the rights holder(s)?

Yes / No / Not checked

Was automatically generated data submitted (e.g. with data mining or web scraping)?

Yes / No / Not checked

If yes: Is the publication of the data allowed according to the terms of service/term of use of the data source?

Yes / No / Not checked

Have all submitted supplementary documents (e.g. articles, reports) been authorized by the respective authors?

Yes / No / Not checked

Does the submission contain any other material that could be subject to copyright issues?

Yes / No / Not checked

If yes: Found in:

Free text

File name(s)

Free text

Further comments on Intellectual Property Rights, licenses and usage rights:

Free text

2) Data Protection:

Answer options

Does the data contain personal data (e.g. names, addresses, IP or e-mail addresses)?

Yes / No / Does not apply

If yes: was consent for publication sought from the data subjects?

Yes / No / Does not apply

Does the data contain specific categories of personal data (so-called sensitive data), e.g. data concerning health, secuality, or ethnicity?

Yes / No / Does not apply / Not checked

Does the data contain very small-scale regional units?

Yes / No / Does not apply / Not checked

Does the data contain very fine-grained classifications of occupations?

Yes / No / Does not apply / Not checked

Are the data subjects from a special and/or very small population?

Yes / No / Does not apply / Not checked

Are combinations of variables possible that would lead to a de-anonymization/re-identification of data subjects (e.g. occupation and geo-information)?

Yes / No / Does not apply

Could linking with other datasets lead to a de-anonymization of data subjects?

Yes / No / Does not apply / Not checked

Do the responses to open-ended questions contain sensitive information falling under data protection regulations?

Yes / No / Not checked

Does the submission contain qualitative data?

Yes / No / Not checked

If yes: has the data been completely anonymized?

Yes / No / Not checked

Does the assigned access class comply with data protection requirements?

Yes / No / Not checked

Further comments on dataprotection

Free text

C Content quality check

1) Data ingest:

Answer options

Are the data in scope for inclusion in SowiDataNet|datorium (research data from social or economic sciences)?

Yes / No

Does the submitted data files match the described project?

Yes / No

Are all data sets included that are described in the documentation?

Yes / No / Does not apply

Does the number of cases in the data match the number of cases stated in the documentation?

Yes / No / Does not apply

Are all variables included that are described in the documentation?

Yes / No / Does not apply

Do the variables match the survey instrument, e.g. the questionnaire?

Yes / No / Does not apply

Further comments on ingest:

Free text

2) Data preparation:

Answer options

Are unique IDs included?

Yes / No / Does not apply / Not checked

Are there coding mistakes and/or implausible variable values?

Yes / No / Does not apply / Not checked

Are there variable labels?

Yes / No / Does not apply / Not checked

If so:

Free text

Are they understandable?

Yes / No / Does not apply / Not checked

Are the labels correct with regard to content?

Yes / No / Does not apply / Not checked

Are there value labels?

Yes / No / Does not apply / Not checked

If so:

Free text

Are they understandable?

Yes / No / Does not apply / Not checked

Are the lables correct with regard to content?

Yes / No / Does not apply / Not checked

Are there missing values?

Yes / No / Does not apply / Not checked

If so:

Free text

Are they understandable?

Yes / No / Does not apply / Not checked

Have they been coded correctly?

Yes / No / Does not apply / Not checked

Does the data contain skip patterns?

Yes / No / Does not apply / Not checked

If so:

Free text

Is the skip pattern correct?

Yes / No / Does not apply / Not checked

Is the skip pattern documented?

Yes / No / Does not apply / Not checked

Do the data contain weighting factors?

Yes / No / Does not apply / Not checked

If so:

Free text

Is the weighting plausible?

Yes / No / Does not apply / Not checked

Is the weighting documented?

Yes / No / Does not apply / Not checked

Further comments on the data preparation:

Free text

3) Metadata:

Answer options

Were all mandatory fields filled correctly?

Yes / No

Were institution-specific mandatory metadata fields filled completely and correctly?

Yes / No / Does not apply

Do the metadata values match the data?

Yes / No / Not checked

Do the metadata values match the corresponding information in the documentation?

Yes / No / Does not apply / Not checked

Does the access class match the institution-specific policy (if applicable)?

Yes / No / Does not apply / Not checked

Does the license match institution-specific policies (if applicable)?

Yes / No / Does not apply / Not checked

Are version numbers and references to preceding or following versions stated correctly?

Yes / No / Does not apply / Not checked

Further comments on metadata:

Free text

4) Documentation:

Answer options

Is the data accompanied by documentation files?

Yes / No / Not checked

If so:

Free text

Questionnaire or other measurement instrument?

Yes / No / Not checked

Codebook?

Yes / No / Not checked

Methods report?

Yes / No / Does not apply / Not checked

Project report or Technical Report?

Yes / No / Does not apply / Not checked

Scripts or syntax files?

Yes / No / Does not apply / Not checked

Are there further documents?

Yes / No / Not checked

Are there references to further documentation on external websites?

Yes / No / Not checked

If so:

Free text

Do the URLs resolve at the time of publication?

Yes / No / Not checked

Are there references to other data publications and/or other repositories?

Yes / No / Not checked

If so:

Free text

Were persistent identifiers used?

Yes / No / Not checked

Do the URLs/Persistent Identifiers resolve correctly?

Yes / No / Not checked

Are there references to publications related to the data?

Yes / No / Not checked

If so:

Free text

Are these described with adequate bibliographical information?

Yes / No / Not checked

Are they referenced with Persistent Identifiers?

Yes / No / Not checked

Do the URLs/Persistent Identifiers resolve correctly?

Yes / No / Not checked

Further comments on documentation:

Free text

D Summary of the quality check

Summary:

Answer options

Are the data quality and the quality of the documentation sufficient for data re-use?

Yes / No

Are the data quality and the quality of the documentation sufficient for including the study in SowiDataNet|datorium?

Yes / No

Final comments:

Free text

[Quelle: GESIS – Leibniz Institut für Sozialwissenschaften, Checkliste für Institutskuratorinnen und -kuratoren von SowiDataNet]

Formal Data Input Check

Formal review of the resources received includes, for example, the following steps:

  • Are the files correct and agreed upon? Were the correct file formats sent? Or were the wrong files sent by mistake?
  • Do the documents sent match, or do they all belong to the same project?
  • Is the Data complete or were files forgotten?
  • Was the correct metadata sent?
  • Are the files intact, i.e., can they be opened and are their contents readable?
  • Are the files virus-free?

Content Data Input Check

For the content check of the received resources you have to look at the files in detail. How granular this check is done depends on the FDZ-specific regulations and the Data types (quantitative or qualitative Data). Checking the content of the resources is of particular relevance and a prerequisite for subsequent use by other researchers. For example, the following criteria are checked:

  • Are all Data protection requirements met?
  • Is this necessary that the Data be anonymized or pseudonymized?
  • Which Data access paths are suitable for the resources? Is publication possible at all or only under certain security measures, such as remote access?
  • Possibly subject the Data to Data cleansing. This involves checking, for example, that variables and values are labeled, that weights have been applied correctly, that filter guidance is correct, etc.
  • In the case of qualitative Data in the form of videos or audio recordings, check whether the Data are individual recording units or a complete Data collection.

In the event that the submission of resources is incorrect or incomplete, you must contact the Data curator. It is important to clarify whether these are minor changes that can possibly be made by you as Data curator. If the changes are major, it may be advisable for the Data provider to make them themselves in order to avoid erroneous changes on your part. In this case, the resources sent should be permanently deleted.  A new review will now take place with the renewed and corrected submissions.

Selection of the Data to be retained

The research data policy of a publisher or funding agency often specifies whether and which data should be retained. If this is not the case, the researchers themselves must weigh up which Data are relevant for possible archiving and/or publication. This decision should depend on potential re-use and data quality.

For the selection and evaluation of research data it is helpful if some aspects are already considered in the research and or submission process. Therefore, advisory services may also be required in this step of the process.

For guidance on relevant advisory topics, see the “How to Advise Researchers” help bar.

Checklist

  • Who is responsible for the selection and are there institution-internal criteria for data selection?
  • Is software used for systematic data selection or is selection done manually?
  • Are the specific accompanying documents (e.g. data documentation) complete?
  • Are the data the basis of a publication or intended as such? Have the standards of good scientific practice been followed?
  • Are the data reusable and useful in their form for further research?
  • Do the data have a particular political and/or social relevance?

[According to: Ludwig, Jens; Enke, Harry (Eds.) (2013): Leitfaden zum Forschungsdaten-Management. Handreichungen aus dem WissGrid-Projekt. Glückstadt: VWH Verlag Werner Hülsbusch Fachverlag für Medientechnik und -wirtschaft (translated by KonsortSWD).]

Further references

FDM-Hinweise für Ihre Beratungspraxis

Data Selection

When selecting research data for archiving in a research data center, the purpose for further use is crucial. Possible decision-making approaches for the selection of data are, for example:

  • The use of the data for further publications: For this purpose, referenced (=processed) data with additional documentation are necessary (metadata).
  • For teaching: For this purpose, samples of original data and compiled data including analysis steps are useful
  • For verification of research results: Referenced data including analysis steps for traceability of steps of all results should be included for this purpose
  • For further analyses: If possible, all original data should be archived for this purpose

Data Evaluation

In the end, researchers and Data curators must decide for themselves which Data are actually relevant for potential reuse. The following checklist can help researchers decide whether Data is worth archiving:

Checklist for clarifying the achievability of research Data
  • Are there any third-party requirements (e.g., from research funding agencies, Data policies, guidelines of the research institution) that make long-term storage necessary?
  • Do the Data donors have the necessary rights to use the Data for sharing? Under what conditions do they "own" the Data?
  • Are the Data collected one-time and not reproducible, or are the costs of reproduction higher than the costs of long-term retention?
  • Is re-collection of Data unlikely to provide better results?
  • Is there a high level of post-use interest in the research Data?
  • Have the Data not yet been fully (scientifically) studied?
  • Are the Data characteristic or atypical of a research area, or are they unique research findings?
  • Do the Data have general or regional historical significance?
  • Is the Data quality good technically and in terms of content?
  • Is descriptive metadata complete or can it be generated?
  • Can the necessary preservation metadata (reference, provenance, context, and persistence information, as well as information on access rights) be provided?

[Source: Weber, Andreas and Piesche, Claudia. "4.2 Datenspeicherung, -kuration und Langzeitverfügbarkeit". Praxishandbuch Forschungsdatenmanagement, edited by Markus Putnings, Heike Neuroth and Janna Neumann, Berlin, Boston: De Gruyter Saur, 2021, pp. 327-356. https://doi.org/10.1515/9783110657807-019 (translated by KonsortSWD)]