Guidelines for authors: how to share your datasets?

By Isabelle Fabrissin On 24 July 2019 In For authors

Science publication moves gradually towards Open Science, meaning that products of research are made as much as possible openly available to the scientific community.

Annals of Forest Science aims to develop an explicit and voluntary policy about access to the data in support of research paper and data papers. We strongly encourage the authors of research papers to provide access to their data under the form of open data sets, together with all required metadata. We discourage the uploading of datasets as supplementary material files that are published online in the journal’s platform under restricted access and involve the transfer of copyrights to the publisher.

Data must be Findable, Accessible, Interoperable, and Reusable (FAIR) and these principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals (https://www.nature.com/articles/sdata201618).

The FAIR guiding principles:

To be Findable:

Datasets are assigned a persistent Digital Object Identifier (DOI)
Datasets are described with rich metadata: please provide a separate file that explains what all of the variables are.
Dataset citation is included in the data availability statement and in the reference list of the manuscript. See our templates for manuscript and title page for guidance. The format for dataset citation should be as follow: Cantiani P, Marchi M (2017) A spatial dataset of forest mensuration collected in black pine plantations in central Italy. V1. CREA. [Dataset]. http://doi.org/10.5281/zenodo.438681. Accessed 28 March 2017.

To be Accessible:

Datasets are openly and freely retrievable by their identifier (DOI);
Datasets are hosted by a stable and recognised open repository (see https://www.re3data.org/ and https://www.nature.com/sdata/policies/repositories);
Datasets should be submitted to discipline-specific repositories whenever possible, or to generalist or institutional repositories if no suitable community resource is available.

To be Interoperable:

Datasets are logically and consistently formatted: Please avoid using proprietary formats to ensure that the files will be accessible to every user. For instance, excel spreadsheets should be converted to a plain text format (.csv or .txt);
Datasets should be described using a standard vocabulary (where available): Common vocabularies consist of lists of standardised terms that cover a broad spectrum of disciplines of relevance to your research topic.

To be Reusable:

Metadata file should include at least (1) the exact variable name as in the data file, (2) the measurement units, (3) a longer explanation of what the variable means;
Datasets are released with a clear and accessible data usage license, preferentially CC-BY 4.0 (the dataset can be shared and adapted if credits are attributed to the data providers);
Datasets are associated with detailed provenance;
Datasets should include details (version, accessibility) of any software that is required to view the data described or to replicate the analysis. If the software was encoded by the authors, please provide the source code together with your dataset.

Organizing datasets in spreadsheets:

To increase the accessibility and reusability of spreadsheet data, please follow the practical recommendations listed below (for more information see: W. Broman & Kara H. Woo (2018), https://doi.org/10.1080/00031305.2017.1375989)

Spreadsheets should be preferentially submitted in CSV or TAB format
Organize your spreadsheet data as a single big rectangle with rows corresponding to subjects and columns corresponding to variables;
Use consistent terms and codes for nominal variables and variable names. Be careful about extra spaces within cells, the usage of capital letters and avoid using special characters (except for underscores and hyphens);
Do not merge or leave blank cells and use some common code such as NA for missing data;
Do not use font colour or highlighting as data but rather add another column with an indicator variable;
Do not submit multiple worksheets within a spreadsheet but rather submit each worksheet and table as a separate file;
Do not include calculations or graphs in the raw data file;
Do not include more than one data in a cell (e.g. value and units);
Give each column a descriptive heading;
Ensure you have used the first cell.

Read the related posts on our blogs:

Annals of Forest Science promotes Open Science by publishing data papers

Publishing data papers in Annals of forest science: detailed guidelines for a smooth preparation and submission

Annals of Forest Science blog

Guidelines for authors: how to share your datasets?

Leave a Reply Cancel reply