Suppr超能文献

基于机器学习的人类分支点标注。

Machine learning annotation of human branchpoints.

机构信息

Genomics and Epigenetics, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia.

St Vincent's Clinical School, University of New South Wales, Sydney, NSW 2052, Australia.

出版信息

Bioinformatics. 2018 Mar 15;34(6):920-927. doi: 10.1093/bioinformatics/btx688.

Abstract

MOTIVATION

The branchpoint element is required for the first lariat-forming reaction in splicing. However current catalogues of human branchpoints remain incomplete due to the difficulty in experimentally identifying these splicing elements. To address this limitation, we have developed a machine-learning algorithm-branchpointer-to identify branchpoint elements solely from gene annotations and genomic sequence.

RESULTS

Using branchpointer, we annotate branchpoint elements in 85% of human gene introns with sensitivity (61.8%) and specificity (97.8%). In addition to annotation, branchpointer can evaluate the impact of SNPs on branchpoint architecture to inform functional interpretation of genetic variants. Branchpointer identifies all published deleterious branchpoint mutations annotated in clinical variant databases, and finds thousands of additional clinical and common genetic variants with similar predicted effects. This genome-wide annotation of branchpoints provides a reference for the genetic analysis of splicing, and the interpretation of noncoding variation.

AVAILABILITY AND IMPLEMENTATION

Branchpointer is written and implemented in the statistical programming language R and is freely available under a BSD license as a package through Bioconductor.

CONTACT

b.signal@garvan.org.au or t.mercer@garvan.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

分支点元素是剪接中第一个套索形成反应所必需的。然而,由于实验鉴定这些剪接元件的困难,目前人类分支点的目录仍然不完整。为了解决这个限制,我们开发了一种机器学习算法——分支指针,仅从基因注释和基因组序列中识别分支点元件。

结果

使用分支指针,我们以 61.8%的灵敏度(61.8%)和 97.8%的特异性(97.8%)注释了 85%的人类基因内含子中的分支点元件。除了注释之外,分支指针还可以评估 SNP 对分支点结构的影响,从而为遗传变异的功能解释提供信息。分支指针识别了所有已发表的临床变异数据库中注释的有害分支点突变,并发现了数千个具有类似预测效果的额外临床和常见遗传变异。这种对分支点的全基因组注释为剪接的遗传分析和非编码变异的解释提供了参考。

可用性和实现

分支指针用统计编程语言 R 编写和实现,并作为 Bioconductor 中的一个包以 BSD 许可证免费提供。

联系方式

b.signal@garvan.org.aut.mercer@garvan.org

补充信息

补充数据可在Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验