Tanaka Ryokei, Kawai Tsubasa, Kawakatsu Taiji, Tanaka Nobuhiro, Shenton Matthew, Yabe Shiori, Uga Yusaku
Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan.
Institute of Agrobiological Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8604, Japan.
BMC Genomics. 2024 Oct 1;25(1):915. doi: 10.1186/s12864-024-10803-3.
Transcriptome-based prediction of complex phenotypes is a relatively new statistical method that links genetic variation to phenotypic variation. The selection of large-effect genes based on a priori biological knowledge is beneficial for predicting oligogenic traits; however, such a simple gene selection method is not applicable to polygenic traits because causal genes or large-effect loci are often unknown. Here, we used several gene-level features and tested whether it was possible to select a gene subset that resulted in better predictive ability than using all genes for predicting a polygenic trait.
Using the phenotypic values of shoot and root traits and transcript abundances in leaves and roots of 57 rice accessions, we evaluated the predictive abilities of the transcriptome-based prediction models. Leaf transcripts predicted shoot phenotypes, such as plant height, more accurately than root transcripts, whereas root transcripts predicted root phenotypes, such as crown root length, more accurately than leaf transcripts. Furthermore, we used the following three features to train the prediction model: (1) tissue specificity of the transcripts, (2) ontology annotations, and (3) co-expression modules for selecting gene subsets. Although models trained by a gene subset often resulted in lower predictive abilities than the model trained by all genes, some gene subsets showed improved predictive ability. For example, using genes expressed in roots but not in leaves, the predictive ability for crown root diameter was improved by more than 10% (R = 0.59 when using all genes; R = 0.66, using 1,554 root-specifically expressed genes). Similarly, genes annotated as "gibberellic acid sensitivity" showed higher predictive ability than using all genes for root dry weight.
Our results highlight both the possibility and difficulty of selecting an appropriate gene subset to predict polygenic traits from transcript abundance, given the current biological knowledge and information. Further integration of multiple sources of information, as well as improvements in gene characterization, may enable the selection of an optimal gene set for the prediction of polygenic phenotypes.
基于转录组的复杂表型预测是一种相对较新的统计方法,它将遗传变异与表型变异联系起来。基于先验生物学知识选择大效应基因有利于预测寡基因性状;然而,这种简单的基因选择方法不适用于多基因性状,因为因果基因或大效应位点往往未知。在此,我们使用了几种基因水平的特征,并测试是否有可能选择一个基因子集,使其在预测多基因性状时比使用所有基因具有更好的预测能力。
利用57份水稻种质的地上部和根部性状的表型值以及叶片和根部的转录本丰度,我们评估了基于转录组的预测模型的预测能力。叶片转录本比根部转录本更准确地预测地上部表型,如株高,而根部转录本比叶片转录本更准确地预测根部表型,如冠根长度。此外,我们使用以下三个特征来训练预测模型:(1)转录本的组织特异性,(2)本体注释,以及(3)用于选择基因子集的共表达模块。尽管由基因子集训练的模型预测能力通常低于由所有基因训练的模型,但一些基因子集显示出预测能力的提高。例如,使用在根部而非叶片中表达的基因,冠根直径的预测能力提高了10%以上(使用所有基因时R = 0.59;使用1554个根特异性表达基因时R = 0.66)。同样,注释为“赤霉素敏感性”的基因在预测根干重时比使用所有基因具有更高的预测能力。
我们的结果凸显了在现有生物学知识和信息的情况下,从转录本丰度中选择合适的基因子集来预测多基因性状的可能性和困难。进一步整合多种信息来源以及改进基因表征,可能有助于选择用于预测多基因表型的最佳基因集。