Suppr超能文献

TIVAN-indel:一种注释和预测非编码调控小插入和缺失的计算框架。

TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions.

机构信息

Department of Computer Science, Indiana University, Bloomington, IN 47405, USA.

Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA.

出版信息

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad060.

Abstract

MOTIVATION

Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencing experiments may lack statistical power and resolution to pinpoint the functional sindel due to lower minor allele frequency or small effect size. As an alternative strategy, a supervised machine learning method can identify the otherwise masked functional sindels by predicting their regulatory potential directly. However, computational methods for annotating and predicting the regulatory sindels, especially in the non-coding regions, are underdeveloped.

RESULTS

By leveraging labeled nc-sindels identified by cis-expression quantitative trait loci analyses across 44 tissues in Genotype-Tissue Expression (GTEx), and a compilation of both generic functional annotations and large-scale epigenomic profiles, we develop TIssue-specific Variant Annotation for Non-coding indel (TIVAN-indel), which is a supervised computational framework for predicting non-coding regulatory sindels. As a result, we demonstrate that TIVAN-indel achieves the best prediction performance in both with-tissue prediction and cross-tissue prediction. As an independent evaluation, we train TIVAN-indel from the 'Whole Blood' tissue in GTEx and test the model using 15 immune cell types from an independent study named Database of Immune Cell Expression. Lastly, we perform an enrichment analysis for both true and predicted sindels in key regulatory regions such as chromatin interactions, open chromatin regions and histone modification sites, and find biologically meaningful enrichment patterns.

AVAILABILITY AND IMPLEMENTATION

https://github.com/lichen-lab/TIVAN-indel.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

人类基因组的小插入和缺失(sindel)对人类疾病有重要影响。非编码 sindel(nc-sindel)影响人类疾病和表型的一个重要机制是通过调节基因表达。然而,由于较小的次要等位基因频率或较小的效应大小,当前的测序实验可能缺乏识别功能 sindel 的统计能力和分辨率。作为一种替代策略,监督机器学习方法可以通过直接预测其调节潜力来识别否则被掩盖的功能 sindel。然而,注释和预测调节 sindel 的计算方法,特别是在非编码区域,还不够发达。

结果

通过利用 cis 表达数量性状基因座分析在 44 种组织中鉴定的标记 nc-sindel,以及通用功能注释和大规模表观基因组谱的综合,我们开发了 TIssue-specific Variant Annotation for Non-coding indel(TIVAN-indel),这是一种用于预测非编码调节 sindel 的监督计算框架。结果表明,TIVAN-indel 在组织内预测和跨组织预测中均具有最佳的预测性能。作为独立评估,我们从 GTEx 中的“全血”组织中训练 TIVAN-indel,并使用名为 Database of Immune Cell Expression 的独立研究中的 15 种免疫细胞类型测试模型。最后,我们对关键调节区域(如染色质相互作用、开放染色质区域和组蛋白修饰位点)中的真实和预测的 sindel 进行富集分析,并发现了有生物学意义的富集模式。

可用性和实现

https://github.com/lichen-lab/TIVAN-indel。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b93e/9900211/cb40d5341ee1/btad060f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验