Phenotypes

Phenotypes are the observable characteristics of an organism resulting from interactions between genes and the environment in which it grows. Plant phenotyping can be used in several fields, such as breeding programs and biological or agronomical research in multi-location environments, germplasm bank characterisation or biorefinery. Phenotypes (traits and associated value) can be used in several fields of expertise or be specific of each field. Furthermore, phenotypes are in interaction, like for instance in biorefineries where pretreatments have an effect on the glucose yield or in breeding where the grain yield is affected by the number of grains and the grain weight.

This section provides standard formats that are used by the community to design nurseries and trials in fields and the minimum metadata required for documentation in various platforms.

Recommendations

Summary

Data format: use data matrices in csv, excel
Metadata and vocabularies: use complete metadata for at least germplasm and observation variables
Keep curated data (checked outliers)

1. Data formats

We recommend following minimum format principles with data matrices plus metadata on at least variables (trait along with method, units and scales or environmental ones) and germplasms. This principle has been formalized in the MIAPPE recommendations.

MIAPPE ISA-Tab is an implementation of this principle. It consists of one zip archive containing data files and metadata files, the latter being used for data discovery and interoperability. More information can be found on this dedicated page or in this presentation from Plant and Animal Genome 2015. It is currently well suited for generation by softwares. ISA-Tab format, phenotype specific configuration and tools are under improvement.

The Breeding API, described below, is a MIAPPE compliant web service API specification.

There is also an emerging initiative to produce a semantic web version of MIAPPE through a Breeding API to RDF transformation (new!).

See the germplasm recommendations for data format regarding germplasm information.

2. Metadata and vocabularies

Observation variables

Observation Variables include trait and environment variables.

We recommend using existing variables, listed in the vocabularies and ontologies below.
To create new observation variables, we recommend using the Crop Trait Dictionary Upload Template available at Crop Ontology website. It must include all mandatory fields (trait name, description, abbreviation, synonyms, methods, and scales) to describe an observation variable creation and sharing. The most important field in this template is the Trait ID which must remain stable and never be modified. Furthermore it must never be deleted, possibly deprecated if needed. This way, it can be used in trials and remain valid in the future.

For Nursery and Trial metadata and description we recommend using the Crop Research Ontology. which describes the terms related to nurseries and trials, field management, field environments, study design, etc. These metadata are actively adapted for a wider use.

For biorefinery, we recommend using the Biorefinery ontology which describes the concepts and terms associated with biomass composition and characterization (crystallinity, surface area, particle size, porosity, etc.), physico-chemical pretreatments, enzymatic hydrolysis, and experimental processes descriptions.

Recommended Variable ontologies and vocabularies

For the difference between metadata, ontologies and vocabularies, see the dedicated page.

3. Raw data

We recommend sharing at least clean documented raw data, like plant height, leaf area, etc…

Phenotype data lifecycle begins with acquisition, then cleaning, elaboration and analyses. The elaboration combines several variables, like phenological stages and traits, to produce elaborated/computed variables used as input for analyses softwares. For instance, leaf area and phenology can be combined to get height at flowering. Different elaborated data are produced for different purposes, it is therefore important to be able to easily generate new ones from raw data.

Some popular Tools

1. Repositories, information systems and data integration tools

The Breeding Managment System, BMS generates standard format for collecting nursery and trial data in fields and uses for variables the Crop Research Ontology for documenting experiment related metadata and trait related ontologies of the Crop Ontology. The format makes it possible to analyze data directly using statistical tools such as Breeding View, Meta-R.

GnpIS is an INRA information system designed for plant and pest genomics. It enables scientists to mine genomic, phenomic and genetic data. For phenomic and phenotype data, it allows data discovery through a keyword based, google like, search engine and data mining. The latter allows dataset building for genetic or phenomic analysis. Data integration in GnpIS is based on a strict identification of germplasms on variables through ontologies like those of the Crop Ontology.

The Breeding API specifies a standard interface for plant phenotype/genotype databases to serve their data to crop breeding applications. It is a shared, open API, to be used by all data providers and data consumers who wish to participate.

For biorefinery applications, the best match pretreatment-biomass achieving best glucose yields can be found through the @Web platform. The Documents tab on @Web structures information by a kind of pretreatment (topics Bioref-XX). Data available include glucose yields, pretreatments used, biomass types and characterization, etc. In the future it will also be possible to find the best match pretreatment-phenotype.

Cyverse (formerly iPlant collaborative) offers many services that allow the analysis of genomic, environment and phenotypic data.

2. Data acquisition

Field Book is a simple app for taking phenotypic notes on field research plots. Collecting data in the field has traditionally been a laborious process requiring writing notes by hand followed by transcription. We have created Field Book to replace paper field books to enable increased collection speed with greater data integrity.

Things to follow in the future

Candidate formats

ped
Bagit
Hadoop File System : HDF5

Written by: WDI working group
Published on: 02 October 2014
Updated on: 27 April 2015

2 Comments

Bettina Berger

27 November 2015 - 1 h 51 min

Sincere apologies for not taking sufficient time for a thorough assessment of this site.
Just some minor points I noticed while browsing through.

I could not find a link to PATO, which I would assume is a useful tool for annotating phenotypes
Also, there’s no reference to iPlant in the US. This may be intentional, but the first phenotype dataset we were requested to make available for a publication was deposited on iPlant and more may follow in future.

1. Cyril Pommier
  
  22 December 2015 - 18 h 26 min
  
  Thank you for your feedback.
  We have added the link to PATO on the ontology page. There is now also a quick reference to iPlant, but not very detailled since we are not direct users of this system. Feel free to send us a more detailled description, we will work on its integration.