Viñas Ramon, Azevedo Tiago, Gamazon Eric R, Liò Pietro
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom.
Vanderbilt Genetics Institute and Data Science Institute, VUMC, Nashville, TN, United States.
Front Genet. 2021 Apr 13;12:624128. doi: 10.3389/fgene.2021.624128. eCollection 2021.
A question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we propose two novel deep learning methods, PMI and GAIN-GTEx, for gene expression imputation. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We show that our approaches compare favorably to several standard and state-of-the-art imputation methods in terms of predictive performance and runtime in two case studies and two imputation scenarios. In comparison conducted on the protein-coding genes, PMI attains the highest performance in inductive imputation whereas GAIN-GTEx outperforms the other methods in in-place imputation. Furthermore, our results indicate strong generalization on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.
一个具有根本生物学意义的问题是,基因子集的表达在多大程度上可用于恢复完整的转录组,这对生物学发现和临床应用具有重要意义。为应对这一挑战,我们提出了两种用于基因表达插补的新型深度学习方法,即PMI和GAIN-GTEx。为了提高我们方法的适用性,我们利用了GTEx v8的数据,这是一个参考资源,它从各种人类组织中生成了全面的转录组集合。在两个案例研究和两种插补场景中,我们表明,在预测性能和运行时间方面,我们的方法优于几种标准和最新的插补方法。在对蛋白质编码基因进行的比较中,PMI在归纳插补方面表现出最高的性能,而GAIN-GTEx在原位插补方面优于其他方法。此外,我们的结果表明,在不同缺失水平的来自3种癌症类型的RNA-Seq数据上具有很强的泛化能力。我们的工作可以促进将大规模RNA生物样本库经济高效地整合到疾病基因组研究中,在各种组织类型中具有很高的适用性。