• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预测有害的非同义单核苷酸多态性:序列和结构属性分析

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.

作者信息

Dobson Richard J, Munroe Patricia B, Caulfield Mark J, Saqi Mansoor As

机构信息

Clinical Pharmacology, The William Harvey Research Institute, Bart's and the London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK.

出版信息

BMC Bioinformatics. 2006 Apr 21;7:217. doi: 10.1186/1471-2105-7-217.

DOI:10.1186/1471-2105-7-217
PMID:16630345
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1489951/
Abstract

BACKGROUND

There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl.

RESULTS

The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC) varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value.

CONCLUSION

The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at http://www.brightstudy.ac.uk/das_help.html.

摘要

背景

公共数据库中单个核苷酸多态性(SNP)的数量呈爆炸式增长。在本研究中,我们聚焦于非同义蛋白质编码单核苷酸多态性(nsSNP),其中一些与疾病相关,另一些则被认为是中性的。我们使用基于结构和序列的特征描述了这两种类型的nsSNP的分布,并使用机器学习方法评估这些属性作为功能预测指标的相对价值。我们还解决了机器学习方法中常见的平衡问题,并展示了不平衡对nsSNP功能预测的影响。我们表明,通过对多数类进行100%欠采样,nsSNP功能预测可得到显著改善。然后将学习到的规则应用于对Ensembl中所有nsSNP的功能进行预测。

结果

预测成功的度量受训练数据集不平衡程度极大影响。我们发现包含所有属性的平衡数据集产生了最佳预测。根据不平衡程度,由马修斯相关系数(MCC)衡量的性能在0.49至0.25之间变化。如先前观察到的,nsSNP位置处的序列保守程度是最有用的单个属性。除了保守性之外,使用平衡数据集进行的结构预测也可能有价值。

结论

基于使用所有属性的平衡数据集对Ensembl中所有nsSNP的预测以DAS注释形式提供。将该轨迹添加到Ensembl的说明可在http://www.brightstudy.ac.uk/das_help.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fd9/1489951/00f299d56885/1471-2105-7-217-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fd9/1489951/00f299d56885/1471-2105-7-217-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fd9/1489951/00f299d56885/1471-2105-7-217-1.jpg

相似文献

1
Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.预测有害的非同义单核苷酸多态性:序列和结构属性分析
BMC Bioinformatics. 2006 Apr 21;7:217. doi: 10.1186/1471-2105-7-217.
2
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
3
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.利用结构和进化信息预测非同义单核苷酸多态性的表型效应。
Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.
4
Prediction of deleterious nonsynonymous single-nucleotide polymorphism for human diseases.人类疾病有害非同义单核苷酸多态性的预测
ScientificWorldJournal. 2013;2013:675851. doi: 10.1155/2013/675851. Epub 2013 Jan 30.
5
Combination use of protein-protein interaction network topological features improves the predictive scores of deleterious non-synonymous single-nucleotide polymorphisms.蛋白质-蛋白质相互作用网络拓扑特征的联合使用提高了有害非同义单核苷酸多态性的预测分数。
Amino Acids. 2014 Aug;46(8):2025-35. doi: 10.1007/s00726-014-1760-9. Epub 2014 May 22.
6
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.利用支持向量机和进化信息预测与单点蛋白质突变相关的人类遗传疾病的发生。
Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7.
7
Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms.基于知识的计算突变分析预测人类非同义单核苷酸多态性的疾病潜能。
J Theor Biol. 2010 Oct 21;266(4):560-8. doi: 10.1016/j.jtbi.2010.07.026. Epub 2010 Jul 23.
8
Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs.根据氨基酸类型对数据集进行划分,可提高有害非同义 SNP 的预测能力。
Biochem Biophys Res Commun. 2012 Mar 2;419(1):99-103. doi: 10.1016/j.bbrc.2012.01.138. Epub 2012 Feb 4.
9
Approaches and resources for prediction of the effects of non-synonymous single nucleotide polymorphism on protein function and interactions.预测非同义单核苷酸多态性对蛋白质功能和相互作用影响的方法与资源。
Curr Pharm Biotechnol. 2008 Apr;9(2):123-33. doi: 10.2174/138920108783955164.
10
Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines.基于支持向量机预测非同义单核苷酸多态性的表型效应。
BMC Bioinformatics. 2007 Nov 16;8:450. doi: 10.1186/1471-2105-8-450.

引用本文的文献

1
Pathogenic single nucleotide polymorphisms in RhoA gene: Insights into structural and functional impacts on RhoA-PLD1 interaction through molecular dynamics simulation.RhoA基因中的致病性单核苷酸多态性:通过分子动力学模拟深入了解对RhoA-PLD1相互作用的结构和功能影响
Curr Res Struct Biol. 2024 Nov 28;8:100159. doi: 10.1016/j.crstbi.2024.100159. eCollection 2024.
2
Unraveling the potential effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on the Protein structure and function of the human gene on type 2 diabetes and colorectal cancer: An approach.解析非同义单核苷酸多态性(nsSNPs)对2型糖尿病和结直肠癌相关人类基因的蛋白质结构和功能的潜在影响:一种方法。
Heliyon. 2024 Aug 31;10(17):e37280. doi: 10.1016/j.heliyon.2024.e37280. eCollection 2024 Sep 15.
3

本文引用的文献

1
Feature selection and the class imbalance problem in predicting protein function from sequence.从序列预测蛋白质功能中的特征选择与类不平衡问题。
Appl Bioinformatics. 2005;4(3):195-203. doi: 10.2165/00822942-200504030-00004.
2
LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.LS-SNP:基于多信息源的编码非同义单核苷酸多态性的大规模注释
Bioinformatics. 2005 Jun 15;21(12):2814-20. doi: 10.1093/bioinformatics/bti442. Epub 2005 Apr 12.
3
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.
Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment.在 CAGI 6 实验中快速区分有害和良性错义突变。
Hum Genomics. 2024 Aug 27;18(1):89. doi: 10.1186/s40246-024-00655-z.
4
Prediction of the most deleterious non-synonymous SNPs in the human IL1B gene: evidence from bioinformatics analyses.从生物信息学分析预测人类 IL1B 基因中最具破坏性的非同义 SNPs。
BMC Genom Data. 2024 Jun 10;25(1):56. doi: 10.1186/s12863-024-01233-x.
5
DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2'-O-Dimethyladenosine Sites in RNA Sequences.DLm6Am:一种基于深度学习的 RNA 序列中 N6,2'-O-二甲基腺苷位点识别工具。
Int J Mol Sci. 2022 Sep 20;23(19):11026. doi: 10.3390/ijms231911026.
6
In Silico Analysis Identified Putative Pathogenic Missense nsSNPs in Human Gene.计算机分析鉴定了人类基因中潜在的致病性错义 nsSNP。
Genes (Basel). 2022 Apr 11;13(4):672. doi: 10.3390/genes13040672.
7
Pathogenic nsSNPs that increase the risks of cancers among the Orang Asli and Malays.导致奥朗阿斯利人和马来人癌症风险增加的致病性 nsSNP。
Sci Rep. 2021 Aug 9;11(1):16158. doi: 10.1038/s41598-021-95618-y.
8
Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives.将癌症基因组学转化为人工智能导向的精准医学:应用、挑战和未来展望。
Hum Genet. 2019 Feb;138(2):109-124. doi: 10.1007/s00439-019-01970-5. Epub 2019 Jan 22.
9
Factor XIII polymorphism and risk of aneurysmal subarachnoid haemorrhage in a south Indian population.南印度人群中凝血因子 XIII 基因多态性与动脉瘤性蛛网膜下腔出血的风险
BMC Med Genet. 2018 Sep 5;19(1):159. doi: 10.1186/s12881-018-0674-x.
10
Predicting the functional consequences of non-synonymous single nucleotide polymorphisms in IL8 gene.预测 IL8 基因中非 synonymous 单核苷酸多态性的功能后果。
Sci Rep. 2017 Jul 26;7(1):6525. doi: 10.1038/s41598-017-06575-4.
利用结构和进化信息预测非同义单核苷酸多态性的表型效应。
Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.
4
Mapping SNPs to protein sequence and structure data.将单核苷酸多态性(SNPs)映射到蛋白质序列和结构数据。
Bioinformatics. 2005 Apr 15;21(8):1443-50. doi: 10.1093/bioinformatics/bti220. Epub 2004 Dec 21.
5
The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants.瑞士蛋白质数据库变异页面与ModSNP数据库:人类蛋白质变异体的序列和结构信息资源。
Hum Mutat. 2004 May;23(5):464-70. doi: 10.1002/humu.20021.
6
topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association.topoSNP:一个包含有无已知疾病关联的非同义单核苷酸多态性的地形数据库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D520-2. doi: 10.1093/nar/gkh104.
7
HGVbase: a curated resource describing human DNA variation and phenotype relationships.HGVbase:一个描述人类DNA变异与表型关系的精选资源库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D516-9. doi: 10.1093/nar/gkh111.
8
A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function.预测单核苷酸多态性对蛋白质功能影响的机器学习方法的比较研究。
Bioinformatics. 2003 Nov 22;19(17):2199-209. doi: 10.1093/bioinformatics/btg297.
9
The amino-acid mutational spectrum of human genetic disease.人类遗传疾病的氨基酸突变谱。
Genome Biol. 2003;4(11):R72. doi: 10.1186/gb-2003-4-11-r72. Epub 2003 Oct 30.
10
MMDB: Entrez's 3D-structure database.MMDB:Entrez的三维结构数据库。
Nucleic Acids Res. 2003 Jan 1;31(1):474-7. doi: 10.1093/nar/gkg086.