Hu Z Z, Narayanaswamy M, Ravikumar K E, Vijay-Shanker K, Wu C H
Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, Washington, DC 20057, USA.
Bioinformatics. 2005 Jun 1;21(11):2759-65. doi: 10.1093/bioinformatics/bti390. Epub 2005 Apr 6.
A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While of great value, such information is limited in databases owing to the laborious process of literature-based curation. Computational literature mining holds promise to facilitate database curation.
A rule-based system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was used to extract protein phosphorylation information from MEDLINE abstracts. An annotation-tagged literature corpus developed at PIR was used to evaluate the system for finding phosphorylation papers and extracting phosphorylation objects (kinases, substrates and sites) from abstracts. RLIMS-P achieved a precision and recall of 91.4 and 96.4% for paper retrieval, and of 97.9 and 88.0% for extraction of substrates and sites. Coupling the high recall for paper retrieval and high precision for information extraction, RLIMS-P facilitates literature mining and database annotation of protein phosphorylation.
大量关于蛋白质磷酸化的实验数据隐藏在快速增长的PubMed文献中。尽管这些信息很有价值,但由于基于文献的编目过程繁琐,此类信息在数据库中有限。计算文献挖掘有望促进数据库编目。
一个基于规则的系统RLIMS-P(用于蛋白质磷酸化的基于规则的文献挖掘系统)被用于从MEDLINE摘要中提取蛋白质磷酸化信息。在PIR开发的一个带有注释标签的文献语料库被用于评估该系统查找磷酸化相关论文以及从摘要中提取磷酸化对象(激酶、底物和位点)的能力。RLIMS-P在论文检索方面的精确率和召回率分别为91.4%和96.4%,在底物和位点提取方面的精确率和召回率分别为97.9%和88.0%。结合论文检索的高召回率和信息提取的高精度,RLIMS-P有助于蛋白质磷酸化的文献挖掘和数据库注释。