预测蛋白质中的与癌症相关的种系变异。

Predicting cancer-associated germline variations in proteins.

机构信息

Biocomputing Group, *CIRI-Health Science and Technology/Department of Biology, via San Giacomo 9/2, Bologna, Italy.

出版信息

BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S8. doi: 10.1186/1471-2164-13-S4-S8.

DOI:10.1186/1471-2164-13-S4-S8

PMID:22759656

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3372458/

Abstract

BACKGROUND

Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases.

RESULTS

We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set.

CONCLUSIONS

Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations.

摘要

背景

目前有多种计算方法可用于对蛋白质变异是否与疾病相关进行分类。然而，由于最近技术进步所产生的数据，使得对与疾病相关的变异进行注释并纳入特定表型成为可能。在此，我们解决了区分与癌症相关的遗传变异与与其他遗传疾病相关的变异的问题。

结果

我们实施了一种新的基于支持向量机的方法，该方法将输入作为蛋白质变异和蛋白质功能，如相关基因本体术语所描述的。我们的方法成功地区分了可能与癌症相关的种系变异与与其他遗传疾病相关的变异。该方法在包含 592 种蛋白质中的 6478 种种系变异（16%与癌症相关）的一组数据上，准确率为 90%，马修斯相关系数为 0.61。在癌症类中，敏感性和特异性分别为 69%和 66%。此外，该方法能够正确排除 1983 种蛋白质中 3392 种未包含在训练/测试集中的体细胞癌症相关变异的约 96%。

结论

在此，我们证明了从与其他遗传疾病相关的变异中成功区分大量与癌症相关的种系蛋白质变异是可行的。这是蛋白质变异注释过程中的一个重要进展。当考虑基因本体术语编码的蛋白质功能时，评分大大提高，这证实了蛋白质功能作为正确注释其变异的关键特征的作用。

相似文献

Predicting cancer-associated germline variations in proteins.预测蛋白质中的与癌症相关的种系变异。

BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S8. doi: 10.1186/1471-2164-13-S4-S8.

A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal.一种计算方法，用于从无匹配正常样本的癌症标本深度测序中区分基因组改变的体细胞起源与种系起源。

PLoS Comput Biol. 2018 Feb 7;14(2):e1005965. doi: 10.1371/journal.pcbi.1005965. eCollection 2018 Feb.

A method to reduce ancestry related germline false positives in tumor only somatic variant calling.一种在仅肿瘤体细胞变异检测中减少与祖先相关的种系假阳性的方法。

BMC Med Genomics. 2017 Oct 19;10(1):61. doi: 10.1186/s12920-017-0296-8.

mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines.mGOASVM：基于基因本体和支持向量机的多标签蛋白质亚细胞定位。

BMC Bioinformatics. 2012 Nov 6;13:290. doi: 10.1186/1471-2105-13-290.

Blind prediction of deleterious amino acid variations with SNPs&GO.利用SNPs&GO对有害氨基酸变异进行盲预测。

Hum Mutat. 2017 Sep;38(9):1064-1071. doi: 10.1002/humu.23179. Epub 2017 May 2.

Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.利用支持向量机和进化信息预测与单点蛋白质突变相关的人类遗传疾病的发生。

Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7.

Germline fitness-based scoring of cancer mutations.基于种系适合度的癌症突变评分。

Genetics. 2011 Jun;188(2):383-93. doi: 10.1534/genetics.111.127480. Epub 2011 Mar 24.

Deleterious somatic variants in 473 consecutive individuals with ovarian cancer: results of the observational AGO-TR1 study (NCT02222883).在 473 名连续的卵巢癌患者中发现有害的种系变异：观察性 AGO-TR1 研究的结果（NCT02222883）。

J Med Genet. 2019 Sep;56(9):574-580. doi: 10.1136/jmedgenet-2018-105930. Epub 2019 Apr 12.

Variant Interpretation for Cancer (VIC): a computational tool for assessing clinical impacts of somatic variants.变体解读癌症工具（VIC）：一个用于评估体细胞变异临床影响的计算工具。

Genome Med. 2019 Aug 23;11(1):53. doi: 10.1186/s13073-019-0664-4.

Predicting mutational function using machine learning.利用机器学习预测突变功能。

Mutat Res Rev Mutat Res. 2023 Jan-Jun;791:108457. doi: 10.1016/j.mrrev.2023.108457. Epub 2023 Mar 23.

引用本文的文献

SNP-SIG Meeting 2011: identification and annotation of SNPs in the context of structure, function, and disease.2011年单核苷酸多态性特别兴趣小组会议：在结构、功能和疾病背景下对单核苷酸多态性的识别与注释

BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S1. doi: 10.1186/1471-2164-13-S4-S1.

本文引用的文献

A new disease-specific machine learning approach for the prediction of cancer-causing missense variants.一种新的针对癌症致错义变异预测的疾病特异性机器学习方法。

Genomics. 2011 Oct;98(4):310-7. doi: 10.1016/j.ygeno.2011.06.010. Epub 2011 Jul 7.

Predicting the functional impact of protein mutations: application to cancer genomics.预测蛋白质突变的功能影响：在癌症基因组学中的应用。

Nucleic Acids Res. 2011 Sep 1;39(17):e118. doi: 10.1093/nar/gkr407. Epub 2011 Jul 3.

Performance of mutation pathogenicity prediction methods on missense variants.错义变异突变致病性预测方法的性能。

Hum Mutat. 2011 Apr;32(4):358-68. doi: 10.1002/humu.21445. Epub 2011 Feb 22.

Tests of association for rare variants: case control mutation screening.罕见变异的关联测试：病例对照突变筛查

Nat Rev Genet. 2011 Mar;12(3):224. doi: 10.1038/nrg2867-c1. Epub 2011 Feb 1.

A guide to web tools to prioritize candidate genes.候选基因优先级排序的网络工具指南

Brief Bioinform. 2011 Jan;12(1):22-32. doi: 10.1093/bib/bbq007. Epub 2010 Mar 21.

COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.COSMIC：在癌症体细胞突变目录中挖掘完整的癌症基因组。

Nucleic Acids Res. 2011 Jan;39(Database issue):D945-50. doi: 10.1093/nar/gkq929. Epub 2010 Oct 15.

Integrating common and rare genetic variation in diverse human populations.整合不同人类群体中的常见和罕见遗传变异。

Nature. 2010 Sep 2;467(7311):52-8. doi: 10.1038/nature09298.

A method and server for predicting damaging missense mutations.一种预测有害错义突变的方法及服务器。

Nat Methods. 2010 Apr;7(4):248-9. doi: 10.1038/nmeth0410-248.

Automated inference of molecular mechanisms of disease from amino acid substitutions.从氨基酸替换自动推断疾病的分子机制。

Bioinformatics. 2009 Nov 1;25(21):2744-50. doi: 10.1093/bioinformatics/btp528. Epub 2009 Sep 3.

Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations.体细胞突变的癌症特异性高通量注释：驱动错义突变的计算预测

Cancer Res. 2009 Aug 15;69(16):6660-7. doi: 10.1158/0008-5472.CAN-09-1133. Epub 2009 Aug 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。