By KU Leuven Research Data Management Support desk on Feb 18, 2019
The starting point of this paper is situated in the interplay between the two related yet very specific concepts of OPEN and FAIR, that are analysed in an academic context. The FAIR principles (ensuring that data are Findable, Accessible, Interoperable and Reusable) are gradual and thus entirely compatible with the idea that research data should be ‘as open as possible, as closed as necessary’. We will use the interaction between FAIR and OPEN to assess the fairness of repositories for research data and publications. We argue that the fairness of a repository depends on the technical standards that are used, but is also conditioned by policy issues.
Data have always been the primary material of science, as they are underpinning research results and scientific publications. Since we entered the age of big data, predictive analytics and artificial intelligence, research tends to be more data-driven. Nowadays, scientists can detect knowledge gaps by analysing the available data and new hypotheses emerge from recurrent patterns. That is why it becomes increasingly important for researchers to share their data.
Hence, open science is generally considered as a driver for innovation that guarantees more transparent, controllable and reproducible research, and FAIR data management has become a major concern of the academic community (with FAIR indicating that data should be Findable, Accessible, Interoperable and Reusable). Of course, the practice of data management is impossible without a proper reflection on the place where data are stored, curated and made discoverable for others. We would like to address this topic with a contribution on the openness and FAIRness of repositories.
A prior element in this discussion is the difference between the two related, yet different and very specific concepts which are OPEN and FAIR. ‘Open’ has become almost synonymous with free, costless availability of research results. As such, the concept contrasts with a culture of science that is locked behind paywalls, usually controlled by commercial legacy publishers. According to the Budapest declaration, open access also means that scholarly publications can be processed in some way, for instance by including them in databases or by text mining. The effective use that can really be made of research results and data depends, among other things, on interoperability, which is something that the first pioneers of the open access movement already stated.
More recently, the scholars of the FORCE11 research community have defined a set of FAIR principles that describe more precisely which conditions have to be met to make research results exploitable by others. Albeit the prominent role of sharing and reuse in the FAIR principles, it is also recognized that not all data can be open. Privacy regulations or IP-potential are the classical examples in this respect. The FAIR principles indeed appear to be very modular, which make them somewhat different from the claims of the open access movement. Furthermore, it should be noted that although the FAIR principles do stress the importance of accessibility, they do so in a technical way, avoiding the difficult issues that are central in the open access debate, such as pricing and the question of whether accessible should mean ‘free’ for reuse.
As the FAIR principles are above all a basis for technical guidelines, they play a key role in the repositories where researchers who are sensitized on open science and aware of reproducibility questions are registering their (meta)data. In some fields, repositories are the way researchers use to share their data. Those repositories can be generic or discipline specific and are managed by a wide variety of actors in the field of scholarly communication. Tools such as descriptive websites or trustworthiness labels, are available to inform researchers about repositories. However, the available information remains sparse and the use that is made of it within research cultures seems to be highly uneven.
In fact, it is not always clear how data repositories will evolve over time. Long-term availability is nevertheless essential to ensure the accessibility of data. That is why policy issues about repositories need to be addressed as well as the technical standards that are used when researchers want to upload their data in an international, non-institutional, repository. Furthermore, the economic model that is followed by the repository has to be taken into account. If it is obvious that storage and curation have a cost, it is also certain that it is in the interest of the scientific community to ensure that pricing remains reasonable and transparent.
We argue that this kind of non-technical criteria should be considered when assessing the FAIRness of repositories and that researchers should receive appropriate information in this respect. For now, a host of questions arise when one tries to make a thoughtful choice about repositories. Which partners are managing and contributing to the repositories? What is their aim and how does it fit in with their other activities? Which service level will the repositories be able to offer in the long run, at which price?