Nap Jan-Peter, Sanchez-Perez Gabino F, van Dijk Aalt D J
Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands.
Laboratory of Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands.
PLoS One. 2017 Aug 10;12(8):e0182097. doi: 10.1371/journal.pone.0182097. eCollection 2017.
Understanding of phenotypes and their genetic basis is a major focus in current plant biology. Large amounts of phenotype data are being generated, both for macroscopic phenotypes such as size or yield, and for molecular phenotypes such as expression levels and metabolite levels. More insight in the underlying genetic and molecular mechanisms that influence phenotypes will enable a better understanding of how various phenotypes are related to each other. This will be a major step forward in understanding plant biology, with immediate value for plant breeding and academic plant research. Currently the genetic basis of most phenotypes remains however to be discovered, and the relatedness of different traits is unclear. We here present a novel approach to connect phenotypes to underlying biological processes and molecular functions. These connections define similarities between different types of phenotypes. The approach starts by using Quantitative Trait Locus (QTL) data, which are abundantly available for many phenotypes of interest. Overrepresentation analysis of gene functions based on Gene Ontology term enrichment across multiple QTL regions for a given phenotype, be it macroscopic or molecular, results in a small set of biological processes and molecular functions for each phenotype. Subsequently, similarity between different phenotypes can be defined in terms of these gene functions. Using publicly available rice data as example, a close relationship with defined molecular phenotypes is demonstrated for many macroscopic phenotypes. This includes for example a link between 'leaf senescence' and 'aspartic acid', as well as between 'days to maturity' and 'choline'. Relationships between macroscopic and molecular phenotypes may result in more efficient marker-assisted breeding and are likely to direct future research aimed at a better understanding of plant phenotypes.
对表型及其遗传基础的理解是当前植物生物学的主要研究重点。目前正在生成大量的表型数据,包括大小或产量等宏观表型,以及表达水平和代谢物水平等分子表型。深入了解影响表型的潜在遗传和分子机制,将有助于更好地理解各种表型之间的相互关系。这将是理解植物生物学的一个重大进展,对植物育种和学术植物研究具有直接价值。然而,目前大多数表型的遗传基础仍有待发现,不同性状之间的相关性也不清楚。我们在此提出一种将表型与潜在生物学过程和分子功能联系起来的新方法。这些联系定义了不同类型表型之间的相似性。该方法首先使用数量性状位点(QTL)数据,这些数据可大量获取用于许多感兴趣的表型。基于给定表型(无论是宏观表型还是分子表型)的多个QTL区域的基因本体术语富集对基因功能进行过表达分析,会为每个表型产生一小部分生物学过程和分子功能。随后,可以根据这些基因功能来定义不同表型之间的相似性。以公开可用的水稻数据为例,证明了许多宏观表型与已定义的分子表型之间存在密切关系。这包括例如“叶片衰老”与“天冬氨酸”之间的联系,以及“成熟天数”与“胆碱”之间的联系。宏观表型与分子表型之间的关系可能会导致更有效的标记辅助育种,并可能指导未来旨在更好地理解植物表型的研究。