• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用蛋白质三维结构提高疾病相关变异体的预测能力。

Improving the prediction of disease-related variants using protein three-dimensional structure.

机构信息

Department of Bioengineering, Stanford University, Stanford, CA, USA.

出版信息

BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-12-S4-S3. Epub 2011 Jul 5.

DOI:10.1186/1471-2105-12-S4-S3
PMID:21992054
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3194195/
Abstract

BACKGROUND

Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. Non-synonymous SNPs occurring in coding regions result in single amino acid polymorphisms (SAPs) that may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performance, the quality of these predictions can be further improved by introducing new features derived from three-dimensional protein structures.

RESULTS

In this paper, we present a structure-based machine learning approach for predicting disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features derived from the protein's sequence, structure, and function. After dataset balancing, the structure-based method (SVM-3D) reaches an overall accuracy of 85%, a correlation coefficient of 0.70, and an area under the receiving operating characteristic curve (AUC) of 0.92. When compared with a similar sequence-based predictor, SVM-3D results in an increase of the overall accuracy and AUC by 3%, and correlation coefficient by 0.06. The robustness of this improvement has been tested on different datasets and in all the cases SVM-3D performs better than previously developed methods even when compared with PolyPhen2, which explicitly considers in input protein structure information.

CONCLUSION

This work demonstrates that structural information can increase the accuracy of disease-related SAPs identification. Our results also quantify the magnitude of improvement on a large dataset. This improvement is in agreement with previously observed results, where structure information enhanced the prediction of protein stability changes upon mutation. Although the structural information contained in the Protein Data Bank is limiting the application and the performance of our structure-based method, we expect that SVM-3D will result in higher accuracy when more structural date become available.

摘要

背景

单核苷酸多态性(SNPs)是人类基因组变异的重要来源。发生在编码区的非同义 SNPs 导致单个氨基酸多态性(SAPs),可能影响蛋白质功能并导致病理学。几种方法试图使用不同的信息来源来估计 SAP 的影响。尽管基于序列的预测器表现出良好的性能,但通过引入源自三维蛋白质结构的新特征,可以进一步提高这些预测的质量。

结果

在本文中,我们提出了一种基于结构的机器学习方法来预测与疾病相关的 SAP。我们在 784 个蛋白质链的 3342 个疾病相关突变和 1644 个中性多态性数据集上训练了支持向量机(SVM)。我们使用源自蛋白质序列、结构和功能的 SVM 输入特征。在数据集平衡后,基于结构的方法(SVM-3D)达到了 85%的总体准确性、0.70 的相关系数和 0.92 的接收操作特征曲线下面积(AUC)。与类似的基于序列的预测器相比,SVM-3D 的总体准确性和 AUC 提高了 3%,相关系数提高了 0.06。这种改进的稳健性已在不同的数据集上进行了测试,在所有情况下,SVM-3D 的性能都优于以前开发的方法,即使与明确考虑输入蛋白质结构信息的 PolyPhen2 相比也是如此。

结论

这项工作表明,结构信息可以提高与疾病相关的 SAP 识别的准确性。我们的结果还量化了在大型数据集上改进的幅度。这种改进与先前观察到的结果一致,其中结构信息增强了突变后蛋白质稳定性变化的预测。尽管蛋白质数据库中包含的结构信息限制了我们基于结构的方法的应用和性能,但我们预计,当更多的结构数据可用时,SVM-3D 将产生更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/f906c1290f0f/1471-2105-12-S4-S3-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/3371553b08d6/1471-2105-12-S4-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/ccd101232a17/1471-2105-12-S4-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/0cadd6370490/1471-2105-12-S4-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/2b5abd75bed7/1471-2105-12-S4-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/b473208768c2/1471-2105-12-S4-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/f906c1290f0f/1471-2105-12-S4-S3-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/3371553b08d6/1471-2105-12-S4-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/ccd101232a17/1471-2105-12-S4-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/0cadd6370490/1471-2105-12-S4-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/2b5abd75bed7/1471-2105-12-S4-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/b473208768c2/1471-2105-12-S4-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecfb/3194195/f906c1290f0f/1471-2105-12-S4-S3-6.jpg

相似文献

1
Improving the prediction of disease-related variants using protein three-dimensional structure.利用蛋白质三维结构提高疾病相关变异体的预测能力。
BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-12-S4-S3. Epub 2011 Jul 5.
2
WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation.WS-SNPs&GO:一个使用功能注释预测人类蛋白质变异体有害影响的网络服务器。
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2164-14-S3-S6. Epub 2013 May 28.
3
A new disease-specific machine learning approach for the prediction of cancer-causing missense variants.一种新的针对癌症致错义变异预测的疾病特异性机器学习方法。
Genomics. 2011 Oct;98(4):310-7. doi: 10.1016/j.ygeno.2011.06.010. Epub 2011 Jul 7.
4
[Application of support vector machine in predicting in-hospital mortality risk of patients with acute kidney injury in ICU].支持向量机在预测ICU中急性肾损伤患者院内死亡风险中的应用
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):239-244.
5
Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information.利用序列衍生信息鉴定有害非同义单核苷酸多态性
BMC Bioinformatics. 2008 Jun 27;9:297. doi: 10.1186/1471-2105-9-297.
6
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.利用结构和进化信息预测非同义单核苷酸多态性的表型效应。
Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.
7
SVM-based method for protein structural class prediction using secondary structural content and structural information of amino acids.基于支持向量机的蛋白质结构类预测方法,该方法利用二级结构含量和氨基酸的结构信息。
J Bioinform Comput Biol. 2011 Aug;9(4):489-502. doi: 10.1142/s0219720011005422.
8
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
9
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
10
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.基于支持向量机的方法区分长非编码 RNA 与蛋白质编码转录本。
BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.

引用本文的文献

1
Deep learning tools predict variants in disordered regions with lower sensitivity.深度学习工具预测无序区域变异的敏感性较低。
BMC Genomics. 2025 Apr 12;26(1):367. doi: 10.1186/s12864-025-11534-9.
2
An easy-to-use three-dimensional protein-structure-prediction online platform "DPL3D" based on deep learning algorithms.一个基于深度学习算法的易于使用的三维蛋白质结构预测在线平台“DPL3D”。
Curr Res Struct Biol. 2025 Jan 3;9:100163. doi: 10.1016/j.crstbi.2024.100163. eCollection 2025 Jun.
3
AFFIPred: AlphaFold2 structure-based Functional Impact Prediction of missense variations.

本文引用的文献

1
MuD: an interactive web server for the prediction of non-neutral substitutions using protein structural data.MuD:一个使用蛋白质结构数据预测非中性替换的交互式网络服务器。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W523-8. doi: 10.1093/nar/gkq528. Epub 2010 Jun 11.
2
Automated inference of molecular mechanisms of disease from amino acid substitutions.从氨基酸替换自动推断疾病的分子机制。
Bioinformatics. 2009 Nov 1;25(21):2744-50. doi: 10.1093/bioinformatics/btp528. Epub 2009 Sep 3.
3
Functional annotations improve the predictive score of human disease-related mutations in proteins.
AFFIPred:基于AlphaFold2结构的错义变异功能影响预测
Protein Sci. 2025 Feb;34(2):e70030. doi: 10.1002/pro.70030.
4
A comprehensive in silico investigation into the pathogenic SNPs in the RTEL1 gene and their biological consequences.全面的 RTEL1 基因致病变异 SNP 的计算机分析及其生物学后果。
PLoS One. 2024 Sep 6;19(9):e0309713. doi: 10.1371/journal.pone.0309713. eCollection 2024.
5
Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment.在 CAGI 6 实验中快速区分有害和良性错义突变。
Hum Genomics. 2024 Aug 27;18(1):89. doi: 10.1186/s40246-024-00655-z.
6
Assessing predictions on fitness effects of missense variants in HMBS in CAGI6.评估CAGI6中对HMBS错义变体适应性效应的预测。
Hum Genet. 2025 Mar;144(2-3):173-189. doi: 10.1007/s00439-024-02680-3. Epub 2024 Aug 7.
7
Identification and In-Silico study of non-synonymous functional SNPs in the human SCN9A gene.鉴定和计算机分析人类 SCN9A 基因中的非同义功能性 SNP。
PLoS One. 2024 Feb 23;19(2):e0297367. doi: 10.1371/journal.pone.0297367. eCollection 2024.
8
A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer.遗传变异数据库和机器学习工具在预测乳腺癌致病性方面的研究进展。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad479.
9
Association of IL-17F rs2397084 (E126G), rs11465553 (V155I) and rs763780 (H161R) variants with rheumatoid arthritis and their effects on the stability of protein.IL-17F rs2397084(E126G)、rs11465553(V155I)和 rs763780(H161R) 变异与类风湿关节炎的关联及其对蛋白质稳定性的影响。
PLoS One. 2023 Sep 26;18(9):e0285874. doi: 10.1371/journal.pone.0285874. eCollection 2023.
10
Understanding structure-guided variant effect predictions using 3D convolutional neural networks.使用三维卷积神经网络理解结构引导的变异效应预测。
Front Mol Biosci. 2023 Jul 5;10:1204157. doi: 10.3389/fmolb.2023.1204157. eCollection 2023.
功能注释提高了蛋白质中人类疾病相关突变的预测得分。
Hum Mutat. 2009 Aug;30(8):1237-44. doi: 10.1002/humu.21047.
4
GENETICS. The Human Variome Project.遗传学。人类变异组计划。
Science. 2008 Nov 7;322(5903):861-2. doi: 10.1126/science.1167363.
5
SNAP predicts effect of mutations on protein function.SNAP预测突变对蛋白质功能的影响。
Bioinformatics. 2008 Oct 15;24(20):2397-8. doi: 10.1093/bioinformatics/btn435. Epub 2008 Aug 30.
6
A three-state prediction of single point mutations on protein stability changes.蛋白质稳定性变化单点突变的三态预测。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-9-S2-S6.
7
Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase.在UniProt/Swiss-Prot知识库中注释单氨基酸多态性。
Hum Mutat. 2008 Mar;29(3):361-6. doi: 10.1002/humu.20671.
8
Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.使用密码子水平的估计进化强度可改善对人类疾病相关蛋白质突变的预测。
Hum Mutat. 2008 Jan;29(1):198-204. doi: 10.1002/humu.20628.
9
Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP).寻找新的结构和序列属性以预测单氨基酸多态性(SAP)可能的疾病关联性。
Bioinformatics. 2007 Jun 15;23(12):1444-50. doi: 10.1093/bioinformatics/btm119. Epub 2007 Mar 24.
10
Distinguishing cancer-associated missense mutations from common polymorphisms.区分癌症相关的错义突变与常见多态性。
Cancer Res. 2007 Jan 15;67(2):465-73. doi: 10.1158/0008-5472.CAN-06-1736.