Ingest (or Dispose)

Data Documentation

A well-considered organization and preparation of research data initially entail additional effort. However, in the course of a project, it saves time and effort when research data are carefully structured and prepared from the outset. Furthermore, research data can only be effectively utilized by other researchers if they have been provided with understandable information. For example, this includes associated documentation material, which should be provided in a consistent and comprehensible manner for data transfer into a research data infrastructure.

Publishing research data requires documentation that is understandable to third parties, as it significantly facilitates the further use of the data and ensures its reproducibility. This not only benefits potential data users but also the data producers and authors themselves. The better a dataset is documented, the more likely it is that it will be used and cited by others, thereby providing “credit” to the data producers in the form of scientific citation.

When no specific requirements are imposed (by funding agencies or repositories), an appropriate documentation format must be determined independently. For this purpose, one can orient themselves around the data lifecycle. During the processing and analysis phase, documentation should be updated with the emerging versions. Documentation can be carried out in a simple text document, using software, or even analogue, for example in a a laboratory notebook. If an analogue form of documentation is chosen, it is important to pay attention to document-proof writing tools. Those using electronic documentation should choose a format that is as open as possible in order to facilitate access to the information and thus the subsequent data reuse.

Documentation using Metadata

The description of research data is accomplished through metadata. Metadata contain structured information about other data and their characteristics. They ensure that digital data and objects can be discovered and searched.

In the following video from UGent Open Science, you will learn what metadata are and why they should be used:

Metadata are stored either independently or together with the data they describe. In order to be potentially machine-readable, for example, in the Semantic Web, metadata are often stored in XML format.

Further information can be found in the article ‘What Are Metadata?‘ in the fundamentals section.

What Should Be Documented?

  • Context of data collection (project goals, hypotheses)
  • Data collection method (sampling method, data collection instruments, hardware and software used, secondary data sources, location, and timeframe of data collection)
  • Data structure and their relationships
  • Quality measures such as cleaning, weighting, data validation, etc.
  • Data versions and the changes they contain
  • Information about access, usage conditions, and confidentiality