Suppr超能文献

基于蛋白质相互作用网络和混合特性区分有害和中性非移码插入缺失。

Discriminating between deleterious and neutral non-frameshifting indels based on protein interaction networks and hybrid properties.

作者信息

Zhang Ning, Huang Tao, Cai Yu-Dong

机构信息

Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin, 300072, People's Republic of China.

出版信息

Mol Genet Genomics. 2015 Feb;290(1):343-52. doi: 10.1007/s00438-014-0922-5. Epub 2014 Sep 24.

Abstract

More than ten thousand coding variants are contained in each human genome; however, our knowledge of the way genetic variants underlie phenotypic differences is far from complete. Small insertions and deletions (indels) are one of the most common types of human genetic variants, and indels play a significant role in human inherited disease. To date, we still lack a comprehensive understanding of how indels cause diseases. Therefore, identification and analysis of such deleterious variants is a key challenge and has been of great interest in the current research in genome biology. Increasing numbers of computational methods have been developed for discriminating between deleterious indels and neutral indels. However, most of the existing methods are based on traditional sequential or structural features, which cannot completely explain the association between indels and the resulting induced inherited disease. In this study, we establish a novel method to predict deleterious non-frameshifting indels based on features extracted from both protein interaction networks and traditional hybrid properties. Each indel was coded by 1,246 features. Using the maximum relevance minimum redundancy method and the incremental feature selection method, we obtained an optimal feature set containing 42 features, of which 21 features were derived from protein interaction networks. Based on the optimal feature set, an 88 % accuracy and a 0.76 MCC value were achieved by a Random Forest as evaluated by the Jackknife cross-validation test. This method outperformed existing methods of predicting deleterious indels, and can be applied in practice for deleterious non-frameshifting indel predictions in genome research. The analysis of the optimal features selected in the model revealed that network interactions play more important roles and could be informative for better illustrating an indel's function and disease associations than traditional sequential or structural features. These results could shed some light on the genetic basis of human genetic variations and human inherited diseases.

摘要

每个人类基因组中包含一万多个编码变体;然而,我们对遗传变体构成表型差异方式的了解还远远不够完整。小插入和缺失(indels)是人类遗传变体最常见的类型之一,并且indels在人类遗传性疾病中起重要作用。迄今为止,我们仍然缺乏对indels如何导致疾病的全面理解。因此,识别和分析此类有害变体是一项关键挑战,并且一直是当前基因组生物学研究的热点。越来越多的计算方法已被开发用于区分有害indels和中性indels。然而,大多数现有方法基于传统的序列或结构特征,这无法完全解释indels与由此引发的遗传性疾病之间的关联。在本研究中,我们基于从蛋白质相互作用网络和传统混合属性中提取的特征,建立了一种预测有害非移码indels的新方法。每个indel由1246个特征编码。使用最大相关最小冗余方法和增量特征选择方法,我们获得了一个包含42个特征的最优特征集,其中21个特征源自蛋白质相互作用网络。基于该最优特征集,通过留一法交叉验证测试评估,随机森林实现了88%的准确率和0.76的马修斯相关系数值。该方法优于现有的预测有害indels的方法,并且可在基因组研究中实际应用于有害非移码indel预测。对模型中选择的最优特征的分析表明,网络相互作用发挥着更重要的作用,并且与传统的序列或结构特征相比,对于更好地阐明indel的功能和疾病关联可能更具信息性。这些结果可能为人类遗传变异和人类遗传性疾病的遗传基础提供一些启示。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验