Research data management
What are research data?
They represent the information, in any format (digital and / or paper, numerical, descriptive, audio or video), collected and used during a research activity, necessary to validate the results achieved.
By way of example: results of experiments (positive or negative), observations, published and unpublished sources, bibliographical references, software and codes, texts and objects. They can be raw or processed.
How are data managed?
Research Data Management (RDM) does not necessarily imply opening the data. It means organizing the work of collecting and storing data to ensure that they are adequately stored, traceable and comprehensible even after some time or by those who have not participated in the collection.
Research data must be managed according to FAIR (Findable, Accessible, Interoperable, Reusable) principles, to make knowledge easy to trace, to ensure it circulates and to encourage its innovation.
To make the data Findable / Traceable
- a persistent identifier must be assigned to the dataset (e.g. DOI or handle)
- the metadata describing the dataset must be comprehensive, accurate and indexed by search engines
To make the data Accessible
- the persistent identifier associated with the dataset correctly resolves the metadata page
- the metadata describing the dataset are public, visible and indexable even if the data are not open access
To make the data Interoperable
- the data are made available in open or widespread formats
- the metadata follow recognized standard patterns
- there are links to other resources linked to the data (e.g. publications or technical reports)
To make the data Re-usable
- the data are described in a way that is easy to understand
- a license with the possibility of reuse has been assigned to the dataset
The methods of data management, enhancement and preservation over time during and after the research are described in the Data Management Plan (DMP).
The data management plan (DMP)
- it is requested by various funding bodies, including the European Commission (Horizon Europe programme), to which it must be delivered within 6 months of funding
- it must already be thought out in the research planning phase
- it must be modified in itinere, every time there are changes in the nature of the data or in the methods of their collection and management
- it must be updated regularly, planning periodic reviews, starting from the first version
- it must be shared with all the researchers engaged in research
- it must be concise, schematic and precise (use tables and bullet points as much as possible, write only what you are sure of)
- it describes:
- what types of data are collected and analysed
- what formats and what software are used
- who is the author of the dataset (s) and is responsible for keeping the plan updated
- any issues relating to ethics, the management of personal and sensitive data, confidentiality and confidentiality requirements
- how to share data with collaborators
- how to protect them and how often to make backup copies
- where and how to store long-term data, at what conservation costs
- who can have access to the data and in what way (open to all, accessible on request) through licenses and data reuse rules
Online services for compiling a data management plan:
Grid for the development of the research data management plan prepared by the Italian Open Science Support Group (IOSSG).
Online services for calculating data management costs:
Open source application for removing personal information from datasets:
Why make the data available?
Open access to scientific research data favours the progress of knowledge and the reproducibility of research, reduces duplication and increases transparency.
The data themselves are not creative intellectual property and are not the subject of copyright. When there are no particular and justified needs for protection (confidentiality restrictions, privacy protection, industrial or commercial exploitation), they can therefore be re-used or re-distributed without restrictions with free domain licenses or those requiring mandatory attribution.
International scientific journals and research funding programmes, in order to allow the validation of scientific publications, increasingly request that:
- research data are made available in dedicated public archives;
- the documentation necessary for understanding the tools and software used to generate and process them (read-me files) is also archived in order to ensure their accessibility over time and to prevent the standards or technologies used from making them difficult to decode after a number of years;
- cross-linking is envisaged, binding the data to the related publications (this is made possible by depositing the data when the publication is accepted and inserting the persistent identifier of the dataset in the drafts of the final version of the publication).
When, how and where should the data be deposited?
The data must be deposited in the chosen archive:
- upon acceptance of the publication in order to attribute to the dataset a persistent identifier (DOI, handle, etc.) to be mentioned within the publication;
- at the latest when the search results are published.
In the first case, it is possible to choose an embargo period that "closes" the dataset until the results are published in the chosen editorial site.
The data must be filed complete with descriptive metadata: author (s) and contributor (s), title, date of publication, abstract, references to any funding, any citation of the publications to which they refer, the distribution license, the level of access and any embargo period.
It is advisable to check if there is a standard to follow in your discipline (e.g. with Repository Finder) and to archive the data in the thematic / disciplinary repositories of your scientific community, which constitute a point of reference and facilitate the traceability and reuse of the deposited data.
These archives must meet certain requirements:
- public governance
- long-term retention of data
- use of open licenses, such as Creative Commons
- standard metadata
- attribution of a persistent identifier (DOI, handle, URN, etc.)
- cross-linking with related scientific publications
- reuse statistics
It is possible to deposit in more than one repository, but it is very important to always use the same persistent identifier, for example the DOI.
Template and guide for the Data Management Plan (Science Europe): template pp. 9-10; guide pp. 17-25
Grid for the development of the research data management plan (IOSSG)
Guidelines for the application of the FAIR principles to the management and reuse of data
Servizio Valutazione della ricerca e Open Science