通过结合梯度树提升与最优邻域属性来准确预测变异的功能效应。

Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties.

作者信息

Pan Yuliang, Liu Diwei, Deng Lei

机构信息

School of Software, Central South University, Changsha, China.

Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China.

出版信息

PLoS One. 2017 Jun 14;12(6):e0179314. doi: 10.1371/journal.pone.0179314. eCollection 2017.

DOI:10.1371/journal.pone.0179314

PMID:28614374

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5470696/

Abstract

Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.

摘要

单氨基酸变异（SAVs）可能会改变生物学功能，包括引发疾病或导致个体间的自然差异。确定SAV与特定疾病之间的关系是理解特定关联潜在机制的起点，并且有助于进一步预防和诊断遗传性疾病。我们提出了PredSAV，这是一种计算方法，通过结合梯度树提升（GTB）算法和最优选择的邻域特征，能够有效预测SAVs与疾病相关联的可能性。采用两步特征选择方法来探索最相关且信息量最大的邻域属性，这些属性有助于在广泛的序列和结构特征（特别是一些新颖的结构邻域特征）范围内预测SAVs的疾病关联性。在基准数据集的交叉验证实验中，PredSAV取得了良好的性能，AUC得分为0.908，特异性为0.838，显著优于其他现有方法。此外，我们通过独立测试验证了所提出方法的能力，并因此获得了竞争优势。PredSAV将梯度树提升与最优选择的邻域特征相结合，在区分疾病相关变异和中性变异时能够给出可靠的预测。与现有方法相比，PredSAV显示出更高的特异性以及整体性能的提升。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过结合梯度树提升与最优邻域属性来准确预测变异的功能效应。

Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过结合梯度树提升与最优邻域属性来准确预测变异的功能效应。

Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献