基于氨基酸序列精确估算蛋白质和肽的等电点。

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences.

作者信息

Audain Enrique, Ramos Yassel, Hermjakob Henning, Flower Darren R, Perez-Riverol Yasset

机构信息

Department of Proteomics, Center of Molecular Immunology.

Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba.

出版信息

Bioinformatics. 2016 Mar 15;32(6):821-7. doi: 10.1093/bioinformatics/btv674. Epub 2015 Nov 14.

DOI:10.1093/bioinformatics/btv674

PMID:26568629

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5939969/

Abstract

MOTIVATION

In any macromolecular polyprotic system-for example protein, DNA or RNA-the isoelectric point-commonly referred to as the pI-can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge-and thus the electrophoretic mobility-of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods.

RESULTS

Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction.

CONTACT

yperez@ebi.ac.uk

AVAILABILITY AND IMPLEMENTATION

The software and data are freely available at https://github.com/ypriverol/pIRSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

在任何大分子多质子体系中——例如蛋白质、DNA或RNA——等电点（通常称为pI）可定义为滴定曲线中的奇点，对应于两性电解质净总表面电荷（以及因此的电泳迁移率）总和为零的溶液pH值。不同的现代分析生物化学和蛋白质组学方法依赖等电点作为蛋白质和肽表征的主要特征。通过等电点进行蛋白质分离是二维凝胶电泳的关键部分，二维凝胶电泳是蛋白质组学的关键前身，其中离散的斑点可在凝胶内进行消化，随后通过分析质谱法鉴定蛋白质。根据肽的pI进行分级分离也广泛用于当前蛋白质组学样品制备程序中LC-MS/MS分析之前。因此，准确的pI理论预测将加快此类分析。虽然这种pI计算被广泛使用，但在很大程度上仍未经过测试，这促使我们努力对标pI预测方法。

结果

使用来自数据库PIP-DB的数据和一个公开可用的数据集作为我们的参考金标准，我们对标了pI计算方法。我们发现这些方法的准确性各不相同，并且对基组的选择高度敏感。机器学习算法，尤其是基于支持向量机的算法，在研究肽混合物时表现出卓越的性能。一般来说，基于学习的pI预测方法（如Cofactor、支持向量机和布兰卡）需要大量的训练数据集，其最终性能将强烈依赖于该数据的质量。与迭代方法相比，机器学习算法的优势在于能够添加新特征以提高预测准确性。

联系方式

yperez@ebi.ac.uk

可用性和实现方式

软件和数据可在https://github.com/ypriverol/pI免费获取。补充信息：补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3124/5939969/f9726900fa00/btv674f1p.jpg

相似文献

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences.基于氨基酸序列精确估算蛋白质和肽的等电点。

Bioinformatics. 2016 Mar 15;32(6):821-7. doi: 10.1093/bioinformatics/btv674. Epub 2015 Nov 14.

Isoelectric point optimization using peptide descriptors and support vector machines.使用肽描述符和支持向量机进行等电点优化。

J Proteomics. 2012 Apr 3;75(7):2269-74. doi: 10.1016/j.jprot.2012.01.029. Epub 2012 Feb 3.

Evaluating preparative isoelectric focusing of complex peptide mixtures for tandem mass spectrometry-based proteomics: a case study in profiling chromatin-enriched subcellular fractions in Saccharomyces cerevisiae.评估用于基于串联质谱的蛋白质组学的复杂肽混合物的制备性等电聚焦：以酿酒酵母中富含染色质的亚细胞组分分析为例

Anal Chem. 2005 May 15;77(10):3198-207. doi: 10.1021/ac0482256.

IPC - Isoelectric Point Calculator.等电点计算器（IPC）

Biol Direct. 2016 Oct 21;11(1):55. doi: 10.1186/s13062-016-0159-9.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

High speed two-dimensional protein separation without gel by isoelectric focusing-asymmetrical flow field flow fractionation: application to urinary proteome.通过等电聚焦-不对称流场流分馏进行的无凝胶高速二维蛋白质分离：应用于尿蛋白质组

J Proteome Res. 2009 Sep;8(9):4272-8. doi: 10.1021/pr900363s.

Interrogation of MS/MS search data with an pI Filter algorithm to increase protein identification success.使用pI过滤算法对串联质谱（MS/MS）搜索数据进行查询，以提高蛋白质鉴定成功率。

Electrophoresis. 2007 Jun;28(12):1867-74. doi: 10.1002/elps.200700022.

Shotgun proteomics: a qualitative approach applying isoelectric focusing on immobilized pH gradient and LC-MS/MS.鸟枪法蛋白质组学：一种应用固定化pH梯度等电聚焦和液相色谱-串联质谱的定性方法。

Methods Mol Biol. 2011;681:449-58. doi: 10.1007/978-1-60761-913-0_26.

PIP-DB: the Protein Isoelectric Point database.PIP-DB：蛋白质等电点数据库。

Bioinformatics. 2015 Jan 15;31(2):295-6. doi: 10.1093/bioinformatics/btu637. Epub 2014 Sep 23.

Proteome-pI: proteome isoelectric point database.蛋白质组等电点：蛋白质组等电点数据库。

Nucleic Acids Res. 2017 Jan 4;45(D1):D1112-D1116. doi: 10.1093/nar/gkw978. Epub 2016 Oct 26.

引用本文的文献

Genome-wide identification and expression analysis of orphan genes in twelve (sub)species.十二个（亚）物种中孤儿基因的全基因组鉴定与表达分析

3 Biotech. 2025 Feb;15(2):41. doi: 10.1007/s13205-025-04213-9. Epub 2025 Jan 14.

Bioinformatics-Driven mRNA-Based Vaccine Design for Controlling Tinea Cruris Induced by .基于生物信息学的信使核糖核酸疫苗设计用于控制由……引起的股癣

Pharmaceutics. 2024 Jul 25;16(8):983. doi: 10.3390/pharmaceutics16080983.

Deepening insights into cholinergic agents for intraocular pressure reduction: systems genetics, molecular modeling, and perspectives.对用于降低眼压的胆碱能药物的深入见解：系统遗传学、分子建模及展望

Front Mol Biosci. 2024 Jul 26;11:1423351. doi: 10.3389/fmolb.2024.1423351. eCollection 2024.

Unveiling cytokine charge disparity as a potential mechanism for immune regulation.揭示细胞因子电荷量差异作为免疫调节的潜在机制。

Cytokine Growth Factor Rev. 2024 Jun;77:1-14. doi: 10.1016/j.cytogfr.2023.12.002. Epub 2023 Dec 26.

A deep learning based ensemble approach for protein allergen classification.一种基于深度学习的蛋白质过敏原分类集成方法。

PeerJ Comput Sci. 2023 Oct 12;9:e1622. doi: 10.7717/peerj-cs.1622. eCollection 2023.

Mapping diversity in African trypanosomes using high resolution spatial proteomics.利用高分辨率空间蛋白质组学绘制非洲锥虫多样性图谱。

Nat Commun. 2023 Jul 21;14(1):4401. doi: 10.1038/s41467-023-40125-z.

Molecular Characterization of Germin-like Protein Genes in () Using Various Approaches.利用多种方法对（）中的类萌发素蛋白基因进行分子特征分析。

ACS Omega. 2023 Apr 26;8(18):16327-16344. doi: 10.1021/acsomega.3c01104. eCollection 2023 May 9.

Comparative In Silico Analysis and Functional Characterization of TANK-Binding Kinase 1-Binding Protein 1.TANK结合激酶1结合蛋白1的计算机模拟比较分析与功能表征

Bioinform Biol Insights. 2023 Apr 2;17:11779322231164828. doi: 10.1177/11779322231164828. eCollection 2023.

Site Identification by Ligand Competitive Saturation-Biologics Approach for Structure-Based Protein Charge Prediction.基于配体竞争饱和的生物大分子方法进行基于结构的蛋白电荷预测的位点鉴定。

Mol Pharm. 2023 May 1;20(5):2600-2611. doi: 10.1021/acs.molpharmaceut.3c00064. Epub 2023 Apr 5.

Caseins: Versatility of Their Micellar Organization in Relation to the Functional and Nutritional Properties of Milk.-caseins：乳中胶束组织的多功能性及其与牛奶的功能和营养特性的关系。

Molecules. 2023 Feb 21;28(5):2023. doi: 10.3390/molecules28052023.

本文引用的文献

PIP-DB: the Protein Isoelectric Point database.PIP-DB：蛋白质等电点数据库。

Bioinformatics. 2015 Jan 15;31(2):295-6. doi: 10.1093/bioinformatics/btu637. Epub 2014 Sep 23.

On best practices in the development of bioinformatics software.论生物信息学软件开发中的最佳实践。

Front Genet. 2014 Jul 2;5:199. doi: 10.3389/fgene.2014.00199. eCollection 2014.

A survey of molecular descriptors used in mass spectrometry based proteomics.基于质谱的蛋白质组学中使用的分子描述符调查。

Curr Top Med Chem. 2014;14(3):388-97. doi: 10.2174/1568026613666131204113537.

HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics.HiRIEF LC-MS 可实现深度蛋白质组覆盖和无偏蛋白质基因组学分析。

Nat Methods. 2014 Jan;11(1):59-62. doi: 10.1038/nmeth.2732. Epub 2013 Nov 17.

Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective.基于质谱的蛋白质组学的开源库和框架：开发者视角

Biochim Biophys Acta. 2014 Jan;1844(1 Pt A):63-76. doi: 10.1016/j.bbapap.2013.02.032. Epub 2013 Mar 1.

Computational proteomics pitfalls and challenges: HavanaBioinfo 2012 workshop report.计算蛋白质组学的陷阱与挑战：2012 年哈瓦那生物信息学研讨会报告。

J Proteomics. 2013 Jul 11;87:134-8. doi: 10.1016/j.jprot.2013.01.019. Epub 2013 Jan 29.

A parallel systematic-Monte Carlo algorithm for exploring conformational space.一种用于探索构象空间的并行系统-蒙特卡罗算法。

Curr Top Med Chem. 2012;12(16):1790-6.

Peptide fractionation by SDS-free polyacrylamide gel electrophoresis for proteomic analysis via DF-PAGE.通过无十二烷基硫酸钠聚丙烯酰胺凝胶电泳进行肽段分级分离，以用于基于差异荧光双向凝胶电泳的蛋白质组学分析。

Methods Mol Biol. 2012;869:197-204. doi: 10.1007/978-1-61779-821-4_16.

Isoelectric point optimization using peptide descriptors and support vector machines.使用肽描述符和支持向量机进行等电点优化。

J Proteomics. 2012 Apr 3;75(7):2269-74. doi: 10.1016/j.jprot.2012.01.029. Epub 2012 Feb 3.

In silico analysis of accurate proteomics, complemented by selective isolation of peptides.计算机分析蛋白质组学的精确性，辅以肽段的选择性分离。

J Proteomics. 2011 Sep 6;74(10):2071-82. doi: 10.1016/j.jprot.2011.05.034. Epub 2011 May 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于氨基酸序列精确估算蛋白质和肽的等电点。

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

CONTACT

AVAILABILITY AND IMPLEMENTATION

动机

结果

联系方式

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献