蛋白质本体中翻译后修饰蛋白异构体的可扩展文本挖掘辅助管理

Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology.

作者信息

Ross Karen E, Natale Darren A, Arighi Cecilia, Chen Sheng-Chih, Huang Hongzhan, Li Gang, Ren Jia, Wang Michael, Vijay-Shanker K, Wu Cathy H

机构信息

Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA.

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA.

出版信息

CEUR Workshop Proc. 2016 Aug;1747. Epub 2016 Nov 29.

PMID:28706471

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5504912/

Abstract

The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.

摘要

蛋白质本体（PRO）从家族到物种内和物种间的蛋白质形式（蛋白异构体）水平定义了蛋白质类别及其相互关系。PRO的独特贡献之一是其对翻译后修饰（PTM）蛋白异构体的表示。然而，由于需要大量的人工整理工作，向PRO中添加PTM蛋白异构体类别的进展相对缓慢。在此，我们报告了一个用于创建PTM蛋白异构体类别的自动化流程，该流程利用了两个专注于磷酸化的文本挖掘工具（RLIMS-P，用于检测激酶、底物和磷酸化位点的提及；eFIP，用于检测磷酸化依赖性蛋白质-蛋白质相互作用（PPI））以及我们的综合PTM数据库iPTMnet。通过应用此流程，我们获得了一组约820个底物-位点对，这些对适用于基于文献证据归属自动生成PRO术语。将这些术语纳入PRO将使PRO对物种特异性PTM蛋白异构体的覆盖范围增加50%。这些新的蛋白异构体中的许多还具有相关的激酶和/或PPI信息。最后，我们展示了一个源自我们数据集的人类和小鼠肽脯氨酰顺反异构酶（PIN1/Pin1）的磷酸化网络，该网络展示了我们所提取信息的生物学复杂性。我们的方法解决了PRO整理中的可扩展性问题，并将进一步扩展以推进磷酸化蛋白异构体的PRO表示。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

蛋白质本体中翻译后修饰蛋白异构体的可扩展文本挖掘辅助管理

Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

蛋白质本体中翻译后修饰蛋白异构体的可扩展文本挖掘辅助管理

Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献