Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN 37232, USA.
Bioinformatics. 2022 Oct 14;38(20):4697-4704. doi: 10.1093/bioinformatics/btac608.
Analysis of whole-genome sequencing (WGS) for genetics is still a challenge due to the lack of accurate functional annotation of non-coding variants, especially the rare ones. As eQTLs have been extensively implicated in the genetics of human diseases, we hypothesize that rare non-coding variants discovered in WGS play a regulatory role in predisposing disease risk.
With thousands of tissue- and cell-type-specific epigenomic features, we propose TVAR. This multi-label learning-based deep neural network predicts the functionality of non-coding variants in the genome based on eQTLs across 49 human tissues in the GTEx project. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to understand shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average AUROC of 0.77 across these tissues. We evaluate TVAR's performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes and Schizophrenia), using TVAR's tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared with five existing state-of-the-art tools. We further evaluate TVAR's G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants and observe the consistently better performance of TVAR compared with other competing tools.
The TVAR source code and its scores on the ClinVar catalog, fine mapped GWAS Loci, high confidence eQTLs from GTEx dataset, and MPRA validated functional variants are available at https://github.com/haiyang1986/TVAR.
Supplementary data are available at Bioinformatics online.
由于缺乏对非编码变异体(尤其是罕见变异体)的准确功能注释,全基因组测序(WGS)在遗传学分析方面仍然具有挑战性。由于 eQTL 广泛涉及人类疾病的遗传学,我们假设在 WGS 中发现的罕见非编码变异体在易患疾病风险中发挥调节作用。
利用数千种组织和细胞类型特异性的表观基因组特征,我们提出了 TVAR。这个基于多标签学习的深度神经网络根据 GTEx 项目中 49 个人类组织中的 eQTL,预测基因组中非编码变异体的功能。TVAR 学习了高维表观基因组与跨组织 eQTL 之间的关系,考虑到组织之间的相关性,以理解共享和组织特异性 eQTL 效应。结果,TVAR 输出组织特异性注释,在这些组织中平均 AUROC 为 0.77。我们使用 TVAR 的组织特异性注释,在四个复杂疾病(冠状动脉疾病、乳腺癌、2 型糖尿病和精神分裂症)上评估 TVAR 的性能,并观察到它在预测常见和罕见变异体的功能变异体方面的性能优于五个现有的最先进的工具。我们进一步评估了 TVAR 的 G 分数,这是一种跨所有组织的评分方案,在 ClinVar、精细映射的 GWAS 基因座、大规模平行报告物测定(MPRA)验证的变体上进行了评估,并观察到 TVAR 与其他竞争工具相比表现一致更好。
TVAR 的源代码及其在 ClinVar 目录、精细映射的 GWAS 基因座、GTEx 数据集的高置信度 eQTLs 和 MPRA 验证的功能变体上的分数可在 https://github.com/haiyang1986/TVAR 上获得。
补充数据可在《生物信息学》在线获得。