Ingest (or Dispose)

Data Preparation

Unless collected data has already undergone further processing, it exists in its raw format. However, for these data to be usable for scientific purposes, they need to be prepared. This is especially crucial for the straightforward reuse of research data by third parties.

Data preparation primarily involves formatting the data, checking and rectifying errors, and providing relevant documentation materials to create a high-quality, analyzable dataset that facilitates a smooth content analysis. Data preparation steps and the extent of preparation differ for quantitative and qualitative research data.

Preparing Quantitative Research Data

There are some standard processes for preparing quantitative data, including the following dimensions:

  • Correct, comprehensive, and understandable labeling of variables and value labels
  • Defining missing values
  • Creating a codebook describing the variables present in the dataset
  • Ensuring compliance with legal requirements, such as anonymizing and pseudonymizing personally identifiable data
  • Performing plausibility and consistency checks
  • Formatting file names

Here you can download a simple guide on how to properly and systematically conduct quantitative data analysis: A Really Simple Guide to Quantitative Data Analysis (Samuels, P. (2020), Birmingham City University).

You can also facilitate the data preparation process for researchers by providing a document describing the effort required for data preparation, on the example of a Contributor’s Guide to Preparing and Archiving Quantitative Data (NDACAN). Ideally, this will assist researchers in paying attention to specific aspects during the data collection process, ultimately reducing the data preparation effort.

Source: National Data Archive on Child Abuse and Neglect (NDACAN). (2022). Contributor’s guide to
preparing and archiving quantitative data (3rd ed.). Ithaca, New York: Cornell

Preparing Qualitative Research Data

Qualitative data material, on the other hand, is typically less standardized and thus requires a more elaborate approach to prepare it for potential reuse.

The following video contains more information on the anonymisation of qualitative and quantitative data. It covers general procedures and best practices, addressing data privacy considerations as well.