• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过结合梯度树提升与最优邻域属性来准确预测变异的功能效应。

Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties.

作者信息

Pan Yuliang, Liu Diwei, Deng Lei

机构信息

School of Software, Central South University, Changsha, China.

Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China.

出版信息

PLoS One. 2017 Jun 14;12(6):e0179314. doi: 10.1371/journal.pone.0179314. eCollection 2017.

DOI:10.1371/journal.pone.0179314
PMID:28614374
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5470696/
Abstract

Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.

摘要

单氨基酸变异(SAVs)可能会改变生物学功能,包括引发疾病或导致个体间的自然差异。确定SAV与特定疾病之间的关系是理解特定关联潜在机制的起点,并且有助于进一步预防和诊断遗传性疾病。我们提出了PredSAV,这是一种计算方法,通过结合梯度树提升(GTB)算法和最优选择的邻域特征,能够有效预测SAVs与疾病相关联的可能性。采用两步特征选择方法来探索最相关且信息量最大的邻域属性,这些属性有助于在广泛的序列和结构特征(特别是一些新颖的结构邻域特征)范围内预测SAVs的疾病关联性。在基准数据集的交叉验证实验中,PredSAV取得了良好的性能,AUC得分为0.908,特异性为0.838,显著优于其他现有方法。此外,我们通过独立测试验证了所提出方法的能力,并因此获得了竞争优势。PredSAV将梯度树提升与最优选择的邻域特征相结合,在区分疾病相关变异和中性变异时能够给出可靠的预测。与现有方法相比,PredSAV显示出更高的特异性以及整体性能的提升。

相似文献

1
Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties.通过结合梯度树提升与最优邻域属性来准确预测变异的功能效应。
PLoS One. 2017 Jun 14;12(6):e0179314. doi: 10.1371/journal.pone.0179314. eCollection 2017.
2
A boosting approach for prediction of protein-RNA binding residues.一种用于预测蛋白质-RNA结合残基的增强方法。
BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):465. doi: 10.1186/s12859-017-1879-2.
3
FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model.FunSAV:使用两阶段随机森林模型预测单氨基酸变异的功能效应。
PLoS One. 2012;7(8):e43847. doi: 10.1371/journal.pone.0043847. Epub 2012 Aug 24.
4
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction.PredPSD:一种用于单链和双链 DNA 结合蛋白预测的梯度提升树方法。
Molecules. 2019 Dec 26;25(1):98. doi: 10.3390/molecules25010098.
5
Boosting prediction performance of protein-protein interaction hot spots by using structural neighborhood properties.利用结构邻域特性提高蛋白质-蛋白质相互作用热点的预测性能。
J Comput Biol. 2013 Nov;20(11):878-91. doi: 10.1089/cmb.2013.0083. Epub 2013 Oct 17.
6
SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features.SuSPect:利用网络特征增强对单氨基酸变异(SAV)表型的预测。
J Mol Biol. 2014 Jul 15;426(14):2692-701. doi: 10.1016/j.jmb.2014.04.026. Epub 2014 May 5.
7
Blind prediction of deleterious amino acid variations with SNPs&GO.利用SNPs&GO对有害氨基酸变异进行盲预测。
Hum Mutat. 2017 Sep;38(9):1064-1071. doi: 10.1002/humu.23179. Epub 2017 May 2.
8
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
9
XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting.XGBPRH:利用极端梯度提升预测蛋白质⁻RNA 界面的结合热点。
Genes (Basel). 2019 Mar 21;10(3):242. doi: 10.3390/genes10030242.
10
Predicting Severity of Disease-Causing Variants.预测致病变体的严重程度。
Hum Mutat. 2017 Apr;38(4):357-364. doi: 10.1002/humu.23173. Epub 2017 Jan 24.

引用本文的文献

1
HPC-Atlas: Computationally Constructing A Comprehensive Atlas of Human Protein Complexes.HPC图谱:通过计算构建人类蛋白质复合物综合图谱
Genomics Proteomics Bioinformatics. 2023 Oct;21(5):976-990. doi: 10.1016/j.gpb.2023.05.001. Epub 2023 Sep 18.
2
Prediction of DNA-Binding Protein-Drug-Binding Sites Using Residue Interaction Networks and Sequence Feature.利用残基相互作用网络和序列特征预测DNA结合蛋白-药物结合位点
Front Bioeng Biotechnol. 2022 Apr 20;10:822392. doi: 10.3389/fbioe.2022.822392. eCollection 2022.
3
Machine learning model for predicting the length of stay in the intensive care unit for Covid-19 patients in the eastern province of Saudi Arabia.

本文引用的文献

1
CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency.CPPred-RF:一种基于序列的用于识别细胞穿透肽及其摄取效率的预测工具。
J Proteome Res. 2017 May 5;16(5):2044-2053. doi: 10.1021/acs.jproteome.7b00019. Epub 2017 Apr 26.
2
Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique.基于序列特征选择技术的蛋白质甲基化位点快速预测。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1264-1273. doi: 10.1109/TCBB.2017.2670558. Epub 2017 Feb 16.
3
Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition.
用于预测沙特阿拉伯东部省份新冠肺炎患者重症监护病房住院时长的机器学习模型
Inform Med Unlocked. 2022;30:100937. doi: 10.1016/j.imu.2022.100937. Epub 2022 Apr 14.
4
EnsembleFam: towards more accurate protein family prediction in the twilight zone.EnsembleFam:迈向更准确地预测模糊区域中的蛋白质家族
BMC Bioinformatics. 2022 Mar 14;23(1):90. doi: 10.1186/s12859-022-04626-w.
5
Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach.使用集成方法计算蛋白质-DNA 结合界面中的热点。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):384. doi: 10.1186/s12859-020-03675-3.
6
Computational identification of N6-methyladenosine sites in multiple tissues of mammals.哺乳动物多个组织中N6-甲基腺嘌呤位点的计算识别
Comput Struct Biotechnol J. 2020 Apr 30;18:1084-1091. doi: 10.1016/j.csbj.2020.04.015. eCollection 2020.
7
6mA-RicePred: A Method for Identifying DNA -Methyladenine Sites in the Rice Genome Based on Feature Fusion.6mA-RicePred:一种基于特征融合识别水稻基因组中DNA甲基腺嘌呤位点的方法。
Front Plant Sci. 2020 Jan 31;11:4. doi: 10.3389/fpls.2020.00004. eCollection 2020.
8
Predicting effective drug combinations using gradient tree boosting based on features extracted from drug-protein heterogeneous network.基于药物-蛋白质异质网络中提取的特征,使用梯度提升树来预测有效的药物组合。
BMC Bioinformatics. 2019 Dec 9;20(1):645. doi: 10.1186/s12859-019-3288-1.
9
A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features.一种通过二肽和氨基酸组成特征优化的随机森林亚高尔基体蛋白分类器。
Front Bioeng Biotechnol. 2019 Sep 4;7:215. doi: 10.3389/fbioe.2019.00215. eCollection 2019.
10
Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation.基于三平行深度卷积神经网络和氨基酸突变预测酶功能。
Int J Mol Sci. 2019 Jun 11;20(11):2845. doi: 10.3390/ijms20112845.
利用新型伪核苷酸组成识别 Sigma70 启动子。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1316-1321. doi: 10.1109/TCBB.2017.2666141. Epub 2017 Feb 8.
4
Pro54DB: a database for experimentally verified sigma-54 promoters.Pro54DB:一个用于实验验证的 sigma-54 启动子数据库。
Bioinformatics. 2017 Feb 1;33(3):467-469. doi: 10.1093/bioinformatics/btw630.
5
Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy.Pretata:运用新特征和降维策略预测TATA结合蛋白
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):114. doi: 10.1186/s12918-016-0353-5.
6
A computational interactome and functional annotation for the human proteome.人类蛋白质组的计算相互作用组和功能注释。
Elife. 2016 Oct 22;5:e18715. doi: 10.7554/eLife.18715.
7
Prediction of phosphothreonine sites in human proteins by fusing different features.通过融合不同特征预测人类蛋白质中的磷酸苏氨酸位点。
Sci Rep. 2016 Oct 4;6:34817. doi: 10.1038/srep34817.
8
iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition.iOri-Human:通过将二核苷酸物理化学性质纳入伪核苷酸组成来识别人类复制起点。
Oncotarget. 2016 Oct 25;7(43):69783-69793. doi: 10.18632/oncotarget.11975.
9
Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition.利用伪氨基酸组成鉴定结核分枝杆菌中的分泌蛋白
Biomed Res Int. 2016;2016:5413903. doi: 10.1155/2016/5413903. Epub 2016 Aug 11.
10
Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition.通过伪氨基酸组成鉴定细菌细胞壁裂解酶
Biomed Res Int. 2016;2016:1654623. doi: 10.1155/2016/1654623. Epub 2016 Jun 29.