Ein Angebot von
Data provided by private companies for research purposes is becoming increasingly important and represents an excellent basis for various methodological and content-related questions in social and economic research – but it is still rarely used. This data makes it possible to measure economic variables in real time in some cases, achieve a higher level of detail, or look inside the black box of decision-making in markets and companies.
Firm data is used, for example, in innovation economics, sustainability research, and many other areas of economic and social research.
Firm data can be divided into two categories (Gottschalk et al., 2023):
Source: Gottschalk, Sandra et al. (2023) : Unternehmensdaten: Nutzbarkeit verbessern, Wirtschaftsdienst, ISSN 1613-978X, Sciendo, Warsaw, Vol. 103, Iss. 11, pp. 750-753, https://doi.org/10.2478/wd-2023-0208
Research data centres (RDC) face several potential problems when working with firm data. Firm data has become accessible to science, particularly with advancing digitalisation and increasing computing capacities. As this data has only recently gained importance in science, research data management for this type of data is not yet sufficiently documented.
Access to firm data can be difficult due to legal restrictions, data protection concerns and ownership considerations on the part of companies. In addition, there are ‘cultural’ differences between the scientific and corporate worlds that need to be bridged, and trust needs to be build with potential data cooperation partners.
Due to an unclear legal situation, possible restrictive licensing conditions imposed by the data-providing companies and the resulting uncertainty, data provided by companies presents specific challenges not only for individual researchers but also for data curation in RDC. Several initiatives, such as New Options for Access to Firm Data by KonsortSWD, strive to establish collaborations that enable FAIR reuse of this data by RDC.
The curation aspects that are particularly relevant to firm data mainly concern the first steps of the data curation life cycle: Create & Receive and Appraise & Select.
Quality issues such as inaccuracies, inconsistencies and missing values can arise in firm data. A special feature of firm data is that its original purpose of collection is not (independent) research. This leads to certain challenges, e.g. that the structure of the data may change. Google, for example, calculated the flu index based on geolocated search queries. The difficulty was that the total number of Google queries has increased over time and that search behaviour and thus keywords have also changed. Therefore, such indices and, in general, the original data must be carefully maintained and updated.
As a data curator, you must pay particular attention to contractual agreements with cooperation partners (companies or firms) when receiving research data. These agreements often regulate the handling of sensitive or proprietary economic data and define the rights and obligations regarding the use, storage and transfer of data. Curators must ensure that all contractual provisions, such as data protection guidelines, confidentiality agreements and rights of use, are complied with to maintain legal and ethical standards and avoid potential conflicts.
In practice, certain conditions are set for the provision of data when working with a cooperation partner. One example of this is the agreement that the data in question will only be made available to the scientific community after the partner has published a quarterly report. The rationale behind this rule is that premature publication of the data or related information could potentially influence the company’s share price before the quarterly report is made publicly available. To avoid undesirable negative consequences, it is the role of data curators to strictly monitor these timeframes and the associated contractual obligations and to ensure that the data is only released once the conditions of the cooperation partner have been met.
Data protection issues pose the greatest challenge when curating firm data.
Anonymisation is subject to special requirements. Firm data may contain confidential information (protection of know-how) or be subject to strict confidentiality and non-disclosure agreements. In the case of company data, it is important to note that, unlike most ‘classic’ research data, it is not only the observation unit (e.g. a person or a household) that is worthy of protection, but also the data provider itself.
One of the examples of a protection form related to the preservation of trade secrets is the RWI-GEO-RED data set from the Ruhr Research Data Centre at the RWI – Leibniz Institute for Economic Research. It contains the collected listings from ImmoScout24. The aggregated number of listings at a given point in time is indicative of the company’s performance, which makes it sensitive (stock market-relevant) information for the company. Therefore, the PUF (Public Use File) is only published by RDC Ruhr without the current edge and with altered case numbers.
Researchers who use firm data for their research usually sign confidentiality agreements. Certain points must be observed in this regard:
Addressing the above mentioned aspects requires cooperation between RDC, company data providers, legal experts and supervisory authorities to ensure the responsible and ethical reuse and archiving of company data for research purposes.