Suppr超能文献

DDIG-in:利用核苷酸和蛋白质水平的序列及结构特性检测由移码插入缺失和无义突变导致的致病基因变异。

DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels.

作者信息

Folkman Lukas, Yang Yuedong, Li Zhixiu, Stantic Bela, Sattar Abdul, Mort Matthew, Cooper David N, Liu Yunlong, Zhou Yaoqi

机构信息

School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardif

School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA.

出版信息

Bioinformatics. 2015 May 15;31(10):1599-606. doi: 10.1093/bioinformatics/btu862. Epub 2015 Jan 7.

Abstract

MOTIVATION

Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy individuals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem.

RESULTS

We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques.

摘要

动机

移码(FS)插入缺失和无义(NS)变异分别通过改变阅读框或引入提前终止密码子来破坏突变位点下游的蛋白质编码序列。尽管蛋白质序列发生了如此剧烈的变化,但在健康个体中也发现了FS插入缺失和NS变异。如何区分致病的FS插入缺失和NS变异与中性变异是一个研究不足的问题。

结果

我们基于来自人类基因突变数据库(遗传性致病)和千人基因组计划(GP)(假定为中性)的真实人类遗传变异构建了一种名为DDIG-in(FS)的机器学习方法。该方法结合了序列和预测的结构特征,并通过对FS插入缺失和NS变异进行10倍交叉验证和独立测试,产生了稳健的性能。我们表明,人类来源的NS变异和来自动物直系同源基因的FS插入缺失可有效地用于对以人类来源的FS插入缺失训练的我们的方法进行独立测试。对于FS插入缺失,DDIG-in(FS)的马修斯相关系数(MCC)为0.59,灵敏度为86%,特异性为72%。将DDIG-in(FS)应用于NS变异产生的性能(MCC为0.43)与专门针对NS变异训练的方法基本相同。结果表明,DDIG-in(FS)比现有技术有显著改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验