Access, Use & Reuse

  • Data Access Paths

    One of the central functions of a research data center, in addition to data archiving and associated documentation and curation activities, is to facilitate data access for secondary use. There are various methods to do this. Both the type of data and the purpose of reuse are decisive for a choice of a data access path. The data can, for example, be made accessible through an ordering system – often called a data catalog or search system. An RDC (Research Data Center) can provide research data through the following access paths:

    Download

    The most open form of data access involves downloading the desired data and associated documents from a data catalog to the user’s device. Depending on the data access strategy, this download may be entirely free or require registration, authentication, or the completion of a data usage agreement. Examples of data usage agreements can be found on the webpages of some institutions (e.g. SOEP).

    Secure Remote Access

    More restrictive data provision methods keep research data either on RDC servers or on servers provided by third-party services, granting users access to the data on these servers.

    The generic term ‘Secure Remote Access’ or ‘Remote Access’ encompasses various methods through which access is granted remotely, typically from the users’ workplace. This means that both data storage and data processing occur on RDC servers, and appropriate software for the data analysis (e.g., Stata, SPSS, MAXQDA) must be provided there. This type of data access requires a highly protected internal IT infrastructure within a segregated network of virtual machines. An example of technology enabling secure online access to research data is JOSUA.

    Remote Desktop

    In case of data provision via Remote Desktop method, users can use software to connect to the RDC server, view the data, and analyze it using the software provided on the server. Users themselves cannot download or import files. All files that users want to import or export are subject to prior review by the RDC (Input/Output Control). After the review, the RDC staff provide users with the necessary input and output.

    Remote Execution

    In the more restrictive Remote Execution method, also known as ‘Controlled Remote Data Processing’ (CRDP) or ‘Remote Computation,’ users can also use software to connect to the RDC server. However, unlike Remote Desktop, they do not have direct access to the data itself. Scripts or syntax for data modification, processing, and analysis are sent blindly to the RDC and are subsequently executed by the RDC. Different levels of automation are required for executing the scripts and transmitting the analysis results. All files that users want to import or export must first be reviewed by the RDC (Input/Output Control). While most output controls are software-based, they are usually additionally reviewed by RDC personnel. The RDC staff then provide users with the necessary input and output.

    Guest Scientist Workstations (GSW)

    With this form of data access, also referred to as guest stays, on-site access, or secure data centers, data access occurs on site at the RDC. This represents the most restrictive form of data access.

    Similar to Secure Remote Access, data storage and processing take place on central RDC servers. However, users do not work at the owm workplace, but at specially equipped workstations on-site at the RDC. These workstations are usually equipped with a computer without internet access and without functional USB ports, drives, etc.

    There are further regulations, which refer to the use of these working facilities, such as for example prohibition of making photos, note-taking, using mobile phones, laptops, and similar devices. Users can only view and analyze the data using the provided software, but they cannot import or export data themselves. All files that users wish to import or export are subject to review (Input/Output Control). Examples of institutions offering GSW access include the Research Data Center of the IAB and GESIS – Leibniz Institute for the Social Sciences.

    A next-level expansion of GSW access can involve the networking of GSWs from different institutions. The minimum standards for GSW networking with regard to room security and criteria concerning the technical environment, which were developed by KonsortSWD in the RDCnet pilot project can also be used as guidelines for GSW access.

    In the sections ‘Legal: Copyright & Data Reuse’ and ‘Tips & Checklists,’ you can find sample contracts and exemplary terms of use for data reuse, which may be relevant for different access methods.

    In general, the least restrictive data provision method necessary for specific research data should be chosen in order to make the data as accessible as possible. It should be noted that more restrictive data provision methods are associated with higher costs and greater processing effort.

    Data Packages

    Data taken into the RDC cannot be directly provided for secondary use. These data may need to be processed, anonymized, and enriched with standardized metadata. In most cases, the data is provided to users in the form of standardized data packages. In some cases, however, data is only compiled on demand at the request of the user.

    Data can be prepared in various ways, depending on the target audience and intended use. The following description aligns with terminology used to distinguish between types of EU data:

    • Public Use File (PUF)
    • Campus Use File (CUF)
    • Scientific Use File (SUF)
    • Secure Use File (SecUF)

    Public Use Files (PUF) are data that can be made available to the general public, with no legal or ethical concerns. In the social sciences, these often include heavily anonymized data or structure files.

    Campus Use File (CUF) refers to data provided exclusively for educational purposes. Typically, Campus Use Files are offered by educational institutions to provide students, faculty, and staff access to specific resources relevant to their academic or administrative needs. These files may include course materials, scientific articles, research databases, or software licenses, among others.

    Access to Campus Use Files is usually restricted to individuals affiliated with the educational institution, often requiring authentication through a campus network or a specialized login system. This is done to control access to licensed or copyrighted materials and ensure they are used only by authorized individuals.

    Scientific Use File (SUF) and Secure Use File (SecUF) refer to data provided exclusively for scientific research, with SecUF containing weakly anonymized or pseudonymized data or even non-anonymized data. In practice, not all terminology may be widely adopted, with SecUF often referred to as SUF since they are also intended for scientific research.

    Scientific Use Files are frequently used for scientific studies, statistical analyses, or other research projects. They offer researchers access to extensive and detailed data without revealing personal information.

    Access to Scientific Use Files is typically highly regulated, requiring approval or a special agreement with the institution providing the data. This is done to ensure that the data is used only for legitimate scientific purposes and to protect individuals’ privacy.

    In principle, multiple versions of a dataset can be created to meet the needs of different user groups, such as a Campus Use File for education and a Scientific Use File for research.