Suppr超能文献

蛋白质本体中翻译后修饰蛋白异构体的可扩展文本挖掘辅助管理

Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology.

作者信息

Ross Karen E, Natale Darren A, Arighi Cecilia, Chen Sheng-Chih, Huang Hongzhan, Li Gang, Ren Jia, Wang Michael, Vijay-Shanker K, Wu Cathy H

机构信息

Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA.

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA.

出版信息

CEUR Workshop Proc. 2016 Aug;1747. Epub 2016 Nov 29.

Abstract

The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.

摘要

蛋白质本体(PRO)从家族到物种内和物种间的蛋白质形式(蛋白异构体)水平定义了蛋白质类别及其相互关系。PRO的独特贡献之一是其对翻译后修饰(PTM)蛋白异构体的表示。然而,由于需要大量的人工整理工作,向PRO中添加PTM蛋白异构体类别的进展相对缓慢。在此,我们报告了一个用于创建PTM蛋白异构体类别的自动化流程,该流程利用了两个专注于磷酸化的文本挖掘工具(RLIMS-P,用于检测激酶、底物和磷酸化位点的提及;eFIP,用于检测磷酸化依赖性蛋白质-蛋白质相互作用(PPI))以及我们的综合PTM数据库iPTMnet。通过应用此流程,我们获得了一组约820个底物-位点对,这些对适用于基于文献证据归属自动生成PRO术语。将这些术语纳入PRO将使PRO对物种特异性PTM蛋白异构体的覆盖范围增加50%。这些新的蛋白异构体中的许多还具有相关的激酶和/或PPI信息。最后,我们展示了一个源自我们数据集的人类和小鼠肽脯氨酰顺反异构酶(PIN1/Pin1)的磷酸化网络,该网络展示了我们所提取信息的生物学复杂性。我们的方法解决了PRO整理中的可扩展性问题,并将进一步扩展以推进磷酸化蛋白异构体的PRO表示。

相似文献

本文引用的文献

2
The Reactome pathway Knowledgebase.Reactome通路知识库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D481-7. doi: 10.1093/nar/gkv1351. Epub 2015 Dec 9.
5
PhosphoSitePlus, 2014: mutations, PTMs and recalibrations.磷酸化位点Plus,2014:突变、翻译后修饰与重新校准。
Nucleic Acids Res. 2015 Jan;43(Database issue):D512-20. doi: 10.1093/nar/gku1267. Epub 2014 Dec 16.
7
Protein Ontology: a controlled structured network of protein entities.蛋白质本体论:一个受控的蛋白质实体结构化网络。
Nucleic Acids Res. 2014 Jan;42(Database issue):D415-21. doi: 10.1093/nar/gkt1173. Epub 2013 Nov 21.
9
PubTator: a web-based text mining tool for assisting biocuration.PubTator:一个用于辅助生物注释的基于网络的文本挖掘工具。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验