• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PaPI:用于评估人类蛋白质编码变体的伪氨基酸组成。

PaPI: pseudo amino acid composition to score human protein-coding variants.

作者信息

Limongelli Ivan, Marini Simone, Bellazzi Riccardo

机构信息

IRCCS Policlinico S. Matteo, Pzz.le Volontari del Sangue 2, 27100, Pavia, Italy.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 1, 27100, Pavia, Italy.

出版信息

BMC Bioinformatics. 2015 Apr 19;16:123. doi: 10.1186/s12859-015-0554-8.

DOI:10.1186/s12859-015-0554-8
PMID:25928477
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4411653/
Abstract

BACKGROUND

High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding.

RESULTS

We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels.

CONCLUSIONS

This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run.

摘要

背景

高通量测序技术能够识别个体的全基因组变异。基因靶向和全外显子实验主要聚焦于与单个或多个核苷酸相关的编码序列变异。分析如此众多基因组变异的生物学意义具有挑战性且对计算要求很高。

结果

我们提出了PaPI,一种通过估计损害其蛋白质相关功能的概率来对人类编码变异进行分类和评分的新机器学习方法。该方法的新颖之处在于使用伪氨基酸组成,通过它野生型和突变型蛋白质序列以离散模型表示。一个机器学习分类器已在一组已知的有害和良性编码变异上进行训练,目的是通过考虑人类基因组中可能导致疾病的隐藏序列模式来对未观察到的变异进行评分。我们展示了两亲性伪氨基酸组成、进化保守性和基于同源蛋白质的方法相结合如何优于几种预测算法,并且它还能够对缺失、插入和插入缺失等复杂变异进行评分。

结论

本文描述了一种预测人类编码变异有害性的机器学习方法。已使用所提出的方法开发了一个免费的网络应用程序(http://papi.unipv.it),能够在单次运行中对多达数千个变异进行评分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/be173348dfdb/12859_2015_554_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/6114b5e7e33e/12859_2015_554_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/83681e78f534/12859_2015_554_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/4315f734e119/12859_2015_554_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/a1ca8aab2135/12859_2015_554_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/04eef7c2f3bd/12859_2015_554_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/be173348dfdb/12859_2015_554_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/6114b5e7e33e/12859_2015_554_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/83681e78f534/12859_2015_554_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/4315f734e119/12859_2015_554_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/a1ca8aab2135/12859_2015_554_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/04eef7c2f3bd/12859_2015_554_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3aa/4411653/be173348dfdb/12859_2015_554_Fig6_HTML.jpg

相似文献

1
PaPI: pseudo amino acid composition to score human protein-coding variants.PaPI:用于评估人类蛋白质编码变体的伪氨基酸组成。
BMC Bioinformatics. 2015 Apr 19;16:123. doi: 10.1186/s12859-015-0554-8.
2
Identifying Mendelian disease genes with the variant effect scoring tool.使用变异效应评分工具鉴定孟德尔疾病基因。
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2164-14-S3-S3. Epub 2013 May 28.
3
DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins.DEOGEN2:人类蛋白质中单氨基酸变异有害性的预测和交互式可视化。
Nucleic Acids Res. 2017 Jul 3;45(W1):W201-W206. doi: 10.1093/nar/gkx390.
4
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
5
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.SNooPer:一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。
BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.
6
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
7
ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants.ClinPred:用于识别与疾病相关的非同义单核苷酸变异的预测工具。
Am J Hum Genet. 2018 Oct 4;103(4):474-483. doi: 10.1016/j.ajhg.2018.08.005. Epub 2018 Sep 13.
8
Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.在检测外显子变异方面,全基因组测序比全外显子测序更强大。
Proc Natl Acad Sci U S A. 2015 Apr 28;112(17):5473-8. doi: 10.1073/pnas.1418631112. Epub 2015 Mar 31.
9
Exome versus transcriptome sequencing in identifying coding region variants.外显子组测序与转录组测序在鉴定编码区变异中的比较。
Expert Rev Mol Diagn. 2012 Apr;12(3):241-51. doi: 10.1586/erm.12.10.
10
Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition.通过将不同的蛋白质描述符纳入通用的周氏伪氨基酸组成来预测蛋白质结构类别。
J Theor Biol. 2014 Nov 7;360:109-116. doi: 10.1016/j.jtbi.2014.07.003. Epub 2014 Jul 12.

引用本文的文献

1
Expanding the Phenotypic Spectrum of SPG4: Autism Spectrum Disorder in Early-Onset and Complex SPAST-HSP and Case Study.扩大SPG4的表型谱:早发性和复杂性痉挛性截瘫伴自闭症谱系障碍及病例研究
Genes (Basel). 2025 Aug 18;16(8):970. doi: 10.3390/genes16080970.
2
Hidden in the Genome: The First Italian Family with North Carolina Macular Dystrophy Carrying a Novel and Duplication.隐藏于基因组中:首个携带新型重复突变的患北卡罗来纳黄斑营养不良的意大利家族
Biomedicines. 2025 Aug 5;13(8):1904. doi: 10.3390/biomedicines13081904.
3
Uncovering a Novel Pathogenic Mechanism of in Mitochondrial Disorders: Insights from Functional Studies on the c.38A>G Variant.

本文引用的文献

1
Majority vote and other problems when using computational tools.使用计算工具时的多数投票及其他问题。
Hum Mutat. 2014 Aug;35(8):912-4. doi: 10.1002/humu.22600. Epub 2014 Jun 28.
2
Improving molecular diagnosis in epilepsy by a dedicated high-throughput sequencing platform.通过专用的高通量测序平台改善癫痫的分子诊断。
Eur J Hum Genet. 2015 Mar;23(3):354-62. doi: 10.1038/ejhg.2014.92. Epub 2014 May 21.
3
Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families.
揭示线粒体疾病中的一种新致病机制:对c.38A>G变异体功能研究的见解
Int J Mol Sci. 2025 Apr 12;26(8):3670. doi: 10.3390/ijms26083670.
4
Digenic variant interpretation with hypothesis-driven explainable AI.基于假设驱动的可解释人工智能的双基因变异解读
NAR Genom Bioinform. 2025 Mar 29;7(2):lqaf029. doi: 10.1093/nargab/lqaf029. eCollection 2025 Jun.
5
Novel genetic determinants contribute to hearing loss in a central European cohort with enlarged vestibular aqueduct.新的遗传决定因素导致中欧大前庭导水管队列中的听力损失。
Mol Med. 2025 Mar 22;31(1):111. doi: 10.1186/s10020-025-01159-9.
6
SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.SProtFP:一种基于机器学习的原核生物中小开放阅读框功能分类方法。
NAR Genom Bioinform. 2025 Jan 7;7(1):lqae186. doi: 10.1093/nargab/lqae186. eCollection 2025 Mar.
7
In-Depth Phenotyping of -Related Disease and Its Role in 17q12 Genomic Disorder.与相关疾病的深入表型分析及其在17q12基因组疾病中的作用。
Biomolecules. 2024 Dec 18;14(12):1626. doi: 10.3390/biom14121626.
8
Which Came First? When Usher Syndrome Type 1 Couples with Neuropsychiatric Disorders.哪个先来?1型Usher综合征与神经精神障碍并存时
Audiol Res. 2023 Dec 11;13(6):989-995. doi: 10.3390/audiolres13060086.
9
Puzzling Out the Genetic Architecture of Endometriosis: Whole-Exome Sequencing and Novel Candidate Gene Identification in a Deeply Clinically Characterised Cohort.解开子宫内膜异位症的遗传结构:在一个具有深入临床特征的队列中进行全外显子组测序和新型候选基因鉴定
Biomedicines. 2023 Jul 27;11(8):2122. doi: 10.3390/biomedicines11082122.
10
NRPreTo: A Machine Learning-Based Nuclear Receptor and Subfamily Prediction Tool.NRPreTo:一种基于机器学习的核受体和亚家族预测工具。
ACS Omega. 2023 May 30;8(23):20379-20388. doi: 10.1021/acsomega.3c00286. eCollection 2023 Jun 13.
Phevor 结合了多个生物医学本体,用于在单个个体和小核家庭中准确识别致病等位基因。
Am J Hum Genet. 2014 Apr 3;94(4):599-610. doi: 10.1016/j.ajhg.2014.03.010.
4
Integrating massively parallel sequencing into diagnostic workflows and managing the annotation and clinical interpretation challenge.将大规模平行测序整合到诊断工作流程中,并应对注释和临床解释方面的挑战。
Hum Mutat. 2014 Apr;35(4):413-23. doi: 10.1002/humu.22525. Epub 2014 Mar 6.
5
Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.全外显子组测序鉴定出与 LDL 胆固醇相关的罕见和低频编码变异。
Am J Hum Genet. 2014 Feb 6;94(2):233-45. doi: 10.1016/j.ajhg.2014.01.010.
6
Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.与疾病相关的变异体中的氨基酸变化与 1000 基因组计划数据集观察到的变异体有很大的不同。
PLoS Comput Biol. 2013;9(12):e1003382. doi: 10.1371/journal.pcbi.1003382. Epub 2013 Dec 12.
7
Feature-based classification of amino acid substitutions outside conserved functional protein domains.基于特征的保守功能蛋白结构域之外氨基酸替换的分类
ScientificWorldJournal. 2013 Nov 17;2013:948617. doi: 10.1155/2013/948617. eCollection 2013.
8
Exploring the composition of protein-ligand binding sites on a large scale.大规模探索蛋白质-配体结合位点的组成。
PLoS Comput Biol. 2013;9(11):e1003321. doi: 10.1371/journal.pcbi.1003321. Epub 2013 Nov 21.
9
Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution.受体结合位点附近的替换决定了流感病毒进化过程中的主要抗原性变化。
Science. 2013 Nov 22;342(6161):976-9. doi: 10.1126/science.1244730.
10
ClinVar: public archive of relationships among sequence variation and human phenotype.ClinVar:序列变异与人类表型之间关系的公共档案。
Nucleic Acids Res. 2014 Jan;42(Database issue):D980-5. doi: 10.1093/nar/gkt1113. Epub 2013 Nov 14.