Ingest (or Dispose)

Data Intake

After controlling whether the submitted research data is suitable for inclusion in an archive or research data center, the corresponding files must be transferred to an appropriate in-house infrastructure or repository while adhering to transparent infrastructure policies and legal requirements. Files that are not suitable for archiving must be securely and permanently deleted.

If you want to transparently communicate the conditions of data acquisition to the researchers who have submitted data to your research data center, you can set up a kind of service catalog. For example, the Research Data Center of the Leibniz Institute for Psychology (ZPID) on its homepage provides an overview of the possible services and their associated cost models.

Work Steps

Data Transfer to the Archive/Research Data Center/Repository

Data ingestion essentially refers to all processes carried out to integrate data into the infrastructure of an archive or a research data center. Such data intake can occur at different stages, either at the end of a project or research phase, or at multiple stages throughout the entire research process. When handing over research data during the project phase, there should be an option of setting an embargo period in case of future data publication, allowing the data to be kept confidential at least until the end of the project.

In scientific research, data are sometimes used that has a special need for protection. There are various reasons for this, such as the personal reference of the data, ethical considerations, or economic relevance. Some research projects require data transfer between collaborators or between the place of origin and the location where the data analysis or ultimately data archiving takes place. This data transfer must be particularly secured to ensure the protection of the data during the transportation.

Setting up a secure workflow for data transfer in a research project depends on numerous conditions. These conditions include the existing infrastructure at the institution, the relevant level of the required data protection, the nature of the origin and destination (including existing infrastructure on the other side), as well as the frequency and volume of data transfer. Due to these specific conditions, the responsible colleagues of your own institution are recommended as the initial points of contact for each project, who will cooperate with you for the purposes of the individual solutions development. Depending on topic, these may include a data protection officer, an IT security officer, and a local research data management team.

In the following passage some possible components of a secure data transfer workflow are presented. The individual components alone do not provide sufficient data protection: the entire information network must be considered and secured. This includes the infrastructural, organizational, personnel, and technical components of data processing. It encompasses the storage locations of the data, the transmission paths through which they are transported, the people responsible, and all other components related to data processing. In consultation with local contacts, the following approach is recommended:

  1. Determine the data protection requirements
  2. Describe/define the information network (infrastructural, organizational, personnel, and technical components of data processing)
  3. Organize/set up measures to secure all components of the information network according to the determined data protection requirements from the infrastructure facilities
  4. Document and regularly review the measures

Another recommended transmission path is the use of a secure service, such as Cryptshare. Cryptshare is a secure file transfer solution used to exchange confidential information or data securely. A primary benefit of Cryptshare is that it allows for the secure transmission of data without resorting to insecure methods such as email attachments or cloud storage. By using encryption technologies and security measures, Cryptshare ensures that the transferred files are protected from unauthorized access.

An essential aspect of transferring data securely is the protection of files, folders, or drives. This can be achieved through encryption. In this process, a string of characters (referred to as “plaintext”) is transformed into a random string of characters (referred to as “ciphertext”) in such a way that the original string can only be restored using a key. Keys are typically random strings of fixed length, which are usually calculated by algorithms based on a password or passphrase chosen by the user. Specialized programs are available for encryption (see, for example, this list from the UK Data Archive). The management of passwords requires particular care to ensure they are neither lost nor fall into unauthorized hands. Password management is a central aspect of data management plans.

Examples of research data with high protection requirements:

  1. Interviews with sensitive data, e.g., on personal conflict behavior
  2. Scientific data subject to an embargo
  3. Film recordings of children in interaction studies

Examples of secure research data transmission with high protection requirements:

  1. Over a network drive in the local area network (LAN) or via VPN
  2. Through a cloud service with server-side encryption (e.g., Nextcloud)
  3. Through university-owned cloud services such as Sciebo, Nextcloud (data encrypted)
  4. Through a cloud service with certified end-to-end encryption (e.g., Teamdrive)
  5. As an attachment in asymmetrically encrypted email
  6. As an encrypted attachment (zip file) via email

Note: These are recommendations. It is always advisable to seek consultation from your institution beforehand, especially when transmitting data with very high protection requirements,

It is important to document information about how the security of data transmission was ensured. This ensures that it can be referred to in case of need, and every step is traceable. A sensible approach is to reference this documentation from the data management plan. Specifics regarding data collection abroad – as long as the GDPR does not apply there – should be clarified with the data protection officer.

For password-protected data, it is, of course, essential to store and transmit the password separately from the data. For instance, when transmitting data via Cryptshare, it is recommended to convey the password through a separate phone call or email.

Preparing Data for Transfer to (Long-term) Archiving

Before research data is accepted into the archive/repository, several procedures must be carried out. These include:

  • Assigning a persistent identifier (PID)
  • Scanning for malware in the submitted files
  • Extracting (machine reading), importing, or creating relevant metadata (both technical and descriptive)
  • Technical validation of data and metadata (if necessary, converting file formats into non-proprietary formats)
  • Potentially rechecking data and metadata for completeness and accuracy (in addition to initial data entry checks)
  • Dividing/combining files according to internal infrastructure conventions (e.g., container files)

Archival Transfer

When transferring files to archival storage, digital signatures or checksums are generated to verify the so-called integrity or completeness of the files. In this process, the checksum of the files is compared before storage with the checksum of the data after retrieval. This allows to determine whether the files have been altered or if there were errors in data transmission.

Source: Jensen, Uwe (2012): Guidelines for Research Data Management: Social Science Survey Data. GESIS-Technical Reports, 2012/07 (https://nbn-resolving.org/urn:nbn:de:0168-ssoar-320650)

In the article tips & checklists. associated with the following process step you will also find some sample contracts for the data transfer by researchers to your research data center.