• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于支持向量机预测非同义单核苷酸多态性的表型效应。

Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines.

作者信息

Tian Jian, Wu Ningfeng, Guo Xuexia, Guo Jun, Zhang Juhua, Fan Yunliu

机构信息

Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.

出版信息

BMC Bioinformatics. 2007 Nov 16;8:450. doi: 10.1186/1471-2105-8-450.

DOI:10.1186/1471-2105-8-450
PMID:18005451
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2216041/
Abstract

BACKGROUND

Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease.

RESULTS

We propose a method, named Parepro (Predicting the amino acid replacement probability), to identify nsSNPs having either deleterious or neutral effects on the resulting protein function. Two independent datasets, HumVar and NewHumVar, taken from the PhD-SNP server, were applied to train the model and test the robustness of Parepro. Using a 20-fold cross validation test on the HumVar dataset, Parepro achieved a Matthews correlation coefficient (MCC) of 50% and an overall accuracy (Q2) of 76%, both of which were higher than those predicted by the methods, such as PolyPhen, SIFT, and HydridMeth. Further analysis on an additional dataset (NewHumVar) using Parepro yielded similar results.

CONCLUSION

The performance of Parepro indicates that it is a powerful tool for predicting the effect of nsSNPs on protein function and would be useful for large-scale analysis of genomic nsSNP data.

摘要

背景

人类遗传变异主要源于单核苷酸多态性(SNP),在整个人口中大约每1000个碱基就会出现一次。导致蛋白质产物中氨基酸变化的非同义SNP(nsSNP)可能占已知与人类遗传性疾病相关的遗传变异的近一半。当今医学遗传学的关键问题之一是识别导致人类疾病相关表型的nsSNP。因此,开发能够识别此类nsSNP的计算工具将增进我们对遗传疾病的理解,并有助于预测疾病。

结果

我们提出了一种名为Parepro(预测氨基酸替代概率)的方法,以识别对所得蛋白质功能具有有害或中性影响的nsSNP。从PhD-SNP服务器获取的两个独立数据集HumVar和NewHumVar用于训练模型并测试Parepro的稳健性。在HumVar数据集上使用20倍交叉验证测试,Parepro的马修斯相关系数(MCC)达到50%,总体准确率(Q2)达到76%,两者均高于PolyPhen、SIFT和HydridMeth等方法的预测值。使用Parepro对另一个数据集(NewHumVar)进行的进一步分析得出了类似的结果。

结论

Parepro的性能表明它是预测nsSNP对蛋白质功能影响的强大工具,将有助于对基因组nsSNP数据进行大规模分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/05d08f4ea1f9/1471-2105-8-450-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/c01be6c8d6cb/1471-2105-8-450-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/2ab85515ac9c/1471-2105-8-450-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/05d08f4ea1f9/1471-2105-8-450-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/c01be6c8d6cb/1471-2105-8-450-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/2ab85515ac9c/1471-2105-8-450-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9170/2216041/05d08f4ea1f9/1471-2105-8-450-3.jpg

相似文献

1
Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines.基于支持向量机预测非同义单核苷酸多态性的表型效应。
BMC Bioinformatics. 2007 Nov 16;8:450. doi: 10.1186/1471-2105-8-450.
2
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.利用结构和进化信息预测非同义单核苷酸多态性的表型效应。
Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.
3
An ANN model for the identification of deleterious nsSNPs in tumor suppressor genes.一种用于识别肿瘤抑制基因中有害非同义单核苷酸多态性的人工神经网络模型。
Bioinformation. 2011 Mar 2;6(1):41-4. doi: 10.6026/97320630006041.
4
Identification and structural comparison of deleterious mutations in nsSNPs of ABL1 gene in chronic myeloid leukemia: a bio-informatics study.慢性髓性白血病中ABL1基因nsSNPs有害突变的鉴定与结构比较:一项生物信息学研究
J Biomed Inform. 2008 Aug;41(4):607-12. doi: 10.1016/j.jbi.2007.12.004. Epub 2007 Dec 31.
5
SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs.SNPeffect v2.0:研究人类非同义单核苷酸多态性分子表型效应的新进展。
Bioinformatics. 2006 Sep 1;22(17):2183-5. doi: 10.1093/bioinformatics/btl348. Epub 2006 Jun 29.
6
GESPA: classifying nsSNPs to predict disease association.GESPA:对非同义单核苷酸多态性进行分类以预测疾病关联性。
BMC Bioinformatics. 2015 Jul 25;16:228. doi: 10.1186/s12859-015-0673-2.
7
Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning.使用监督学习和半监督学习确定非同义单核苷酸多态性对蛋白质-蛋白质相互作用的影响。
PLoS Comput Biol. 2014 May 1;10(5):e1003592. doi: 10.1371/journal.pcbi.1003592. eCollection 2014 May.
8
Phenotype prediction of non-synonymous single-nucleotide polymorphisms in human ATP-binding cassette transporter genes.人类 ATP 结合盒转运蛋白基因中非 synonymous 单核苷酸多态性的表型预测。
Basic Clin Pharmacol Toxicol. 2011 Feb;108(2):94-114. doi: 10.1111/j.1742-7843.2010.00627.x. Epub 2010 Sep 6.
9
Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs.根据氨基酸类型对数据集进行划分,可提高有害非同义 SNP 的预测能力。
Biochem Biophys Res Commun. 2012 Mar 2;419(1):99-103. doi: 10.1016/j.bbrc.2012.01.138. Epub 2012 Feb 4.
10
Automated identification of single nucleotide polymorphisms from sequencing data.从测序数据中自动识别单核苷酸多态性
Proc IEEE Comput Soc Bioinform Conf. 2002;1:87-93.

引用本文的文献

1
Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors.变异影响预测器数据库(VIPdb),版本 2:三十年来遗传变异影响预测器的趋势。
Hum Genomics. 2024 Aug 28;18(1):90. doi: 10.1186/s40246-024-00663-z.
2
Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment.在 CAGI 6 实验中快速区分有害和良性错义突变。
Hum Genomics. 2024 Aug 27;18(1):89. doi: 10.1186/s40246-024-00655-z.
3
Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors.

本文引用的文献

1
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.利用支持向量机和进化信息预测与单点蛋白质突变相关的人类遗传疾病的发生。
Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7.
2
Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.预测有害的非同义单核苷酸多态性:序列和结构属性分析
BMC Bioinformatics. 2006 Apr 21;7:217. doi: 10.1186/1471-2105-7-217.
3
Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods.
变异影响预测数据库(VIPdb),版本2:25年基因变异影响预测的趋势
bioRxiv. 2024 Jun 28:2024.06.25.600283. doi: 10.1101/2024.06.25.600283.
4
Enhancing the endo-activity of the thermophilic chitinase to yield chitooligosaccharides with high degrees of polymerization.增强嗜热几丁质酶的内切活性以产生具有高聚合度的壳寡糖。
Bioresour Bioprocess. 2024 Mar 7;11(1):29. doi: 10.1186/s40643-024-00735-x.
5
Mining and rational design of psychrophilic catalases using metagenomics and deep learning models.利用宏基因组学和深度学习模型挖掘和合理设计耐冷过氧化氢酶。
Appl Microbiol Biotechnol. 2024 Dec;108(1):31. doi: 10.1007/s00253-023-12926-1. Epub 2024 Jan 4.
6
MPEPE, a predictive approach to improve protein expression in based on deep learning.MPEPE,一种基于深度学习提高蛋白质表达的预测方法。 (你提供的原文中“in based on”表述有误,推测可能是“in vitro”之类的,这里按照纠正后的意思翻译)
Comput Struct Biotechnol J. 2022 Mar 1;20:1142-1153. doi: 10.1016/j.csbj.2022.02.030. eCollection 2022.
7
Pathogenic nsSNPs that increase the risks of cancers among the Orang Asli and Malays.导致奥朗阿斯利人和马来人癌症风险增加的致病性 nsSNP。
Sci Rep. 2021 Aug 9;11(1):16158. doi: 10.1038/s41598-021-95618-y.
8
VIPdb, a genetic Variant Impact Predictor Database.VIPdb,一个遗传变异影响预测数据库。
Hum Mutat. 2019 Sep;40(9):1202-1214. doi: 10.1002/humu.23858. Epub 2019 Aug 17.
9
Role of Structural Bioinformatics in Drug Discovery by Computational SNP Analysis: Analyzing Variation at the Protein Level.结构生物信息学在通过计算 SNP 分析进行药物发现中的作用:在蛋白质水平分析变异。
Glob Heart. 2017 Jun;12(2):151-161. doi: 10.1016/j.gheart.2017.01.009. Epub 2017 Mar 13.
10
Computational approaches for predicting mutant protein stability.预测突变蛋白稳定性的计算方法。
J Comput Aided Mol Des. 2016 May;30(5):401-12. doi: 10.1007/s10822-016-9914-3. Epub 2016 May 9.
预测p53错义突变生物学效应的计算方法:三种基于序列分析的方法比较
Nucleic Acids Res. 2006 Mar 6;34(5):1317-25. doi: 10.1093/nar/gkj518. Print 2006.
4
Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene.准确预测ABCA1基因中单个核苷酸多态性和突变的功能意义。
PLoS Genet. 2005 Dec;1(6):e83. doi: 10.1371/journal.pgen.0010083. Epub 2005 Dec 30.
5
Identification and analysis of deleterious human SNPs.有害人类单核苷酸多态性的鉴定与分析。
J Mol Biol. 2006 Mar 10;356(5):1263-74. doi: 10.1016/j.jmb.2005.12.025. Epub 2005 Dec 27.
6
Prediction of protein stability changes for single-site mutations using support vector machines.使用支持向量机预测单点突变的蛋白质稳定性变化
Proteins. 2006 Mar 1;62(4):1125-32. doi: 10.1002/prot.20810.
7
A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli.一种基于支持向量机的方法,用于预测蛋白质在大肠杆菌中过表达时可溶或形成包涵体的倾向。
Bioinformatics. 2006 Feb 1;22(3):278-84. doi: 10.1093/bioinformatics/bti810. Epub 2005 Dec 6.
8
Predicting protein stability changes from sequences using support vector machines.使用支持向量机从序列预测蛋白质稳定性变化。
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii54-8. doi: 10.1093/bioinformatics/bti1109.
9
ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures.ConSurf 2005:蛋白质结构上残基进化保守性得分的投影
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W299-302. doi: 10.1093/nar/gki370.
10
Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity.错义替换导致的物理化学限制违反介导了蛋白质功能受损和疾病严重程度。
Genome Res. 2005 Jul;15(7):978-86. doi: 10.1101/gr.3804205. Epub 2005 Jun 17.