Suppr超能文献

生物医学文献中蛋白质催化位点的检测。

Detection of protein catalytic sites in the biomedical literature.

作者信息

Verspoor Karin, Mackinlay Andrew, Cohn Judith D, Wall Michael E

机构信息

National ICT Australia, Victoria Research Lab, Parkville, VIC 3010, Australia.

出版信息

Pac Symp Biocomput. 2013:433-44.

Abstract

This paper explores the application of text mining to the problem of detecting protein functional sites in the biomedical literature, and specifically considers the task of identifying catalytic sites in that literature. We provide strong evidence for the need for text mining techniques that address residue-level protein function annotation through an analysis of two corpora in terms of their coverage of curated data sources. We also explore the viability of building a text-based classifier for identifying protein functional sites, identifying the low coverage of curated data sources and the potential ambiguity of information about protein functional sites as challenges that must be addressed. Nevertheless we produce a simple classifier that achieves a reasonable ∼69% F-score on our full text silver corpus on the first attempt to address this classification task. The work has application in computational prediction of the functional significance of protein sites as well as in curation workflows for databases that capture this information.

摘要

本文探讨了文本挖掘在生物医学文献中检测蛋白质功能位点问题上的应用,特别考虑了在该文献中识别催化位点的任务。通过对两个语料库在策划数据源覆盖范围方面的分析,我们为需要通过残基水平的蛋白质功能注释来解决的文本挖掘技术提供了有力证据。我们还探讨了构建基于文本的分类器以识别蛋白质功能位点的可行性,确定了策划数据源的低覆盖率以及蛋白质功能位点信息的潜在模糊性是必须解决的挑战。尽管如此,我们首次尝试解决此分类任务时,在全文银语料库上生成了一个简单的分类器,其F值达到了约69%。这项工作在蛋白质位点功能重要性的计算预测以及捕获此信息的数据库的策划工作流程中都有应用。

相似文献

3
Text mining improves prediction of protein functional sites.
PLoS One. 2012;7(2):e32171. doi: 10.1371/journal.pone.0032171. Epub 2012 Feb 29.
6
Building a protein name dictionary from full text: a machine learning term extraction approach.
BMC Bioinformatics. 2005 Apr 7;6:88. doi: 10.1186/1471-2105-6-88.
7
Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines.
Comput Methods Programs Biomed. 2018 Jul;160:57-64. doi: 10.1016/j.cmpb.2018.03.022. Epub 2018 Mar 22.
8
Terminological resources for text mining over biomedical scientific literature.
Artif Intell Med. 2011 Jun;52(2):107-14. doi: 10.1016/j.artmed.2011.04.011. Epub 2011 Jun 11.
9
Learning to extract relations for protein annotation.
Bioinformatics. 2007 Jul 1;23(13):i256-63. doi: 10.1093/bioinformatics/btm168.
10
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.
J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.

引用本文的文献

1
PDF text classification to leverage information extraction from publication reports.
J Biomed Inform. 2016 Jun;61:141-8. doi: 10.1016/j.jbi.2016.03.026. Epub 2016 Apr 1.
2
Associating disease-related genetic variants in intergenic regions to the genes they impact.
PeerJ. 2014 Oct 23;2:e639. doi: 10.7717/peerj.639. eCollection 2014.
3
Literature mining of genetic variants for curation: quantifying the importance of supplementary material.
Database (Oxford). 2014 Feb 10;2014:bau003. doi: 10.1093/database/bau003. Print 2014.
4
Annotating the biomedical literature for the human variome.
Database (Oxford). 2013 Apr 12;2013:bat019. doi: 10.1093/database/bat019. Print 2013.

本文引用的文献

1
Literature mining of protein-residue associations with graph rules learned through distant supervision.
J Biomed Semantics. 2012 Oct 5;3 Suppl 3(Suppl 3):S2. doi: 10.1186/2041-1480-3-S3-S2.
2
Automated extraction and semantic analysis of mutation impacts from the biomedical literature.
BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-13-S4-S10.
3
BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.
J Biomed Semantics. 2012 Apr 1;3:3. doi: 10.1186/2041-1480-3-3.
4
Text mining improves prediction of protein functional sites.
PLoS One. 2012;7(2):e32171. doi: 10.1371/journal.pone.0032171. Epub 2012 Feb 29.
5
6
Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.
BMC Bioinformatics. 2009 Aug 27;10 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-10-S8-S4.
7
Binding MOAD, a high-quality protein-ligand database.
Nucleic Acids Res. 2008 Jan;36(Database issue):D674-8. doi: 10.1093/nar/gkm911. Epub 2007 Nov 30.
8
9
Manual curation is not sufficient for annotation of genomic databases.
Bioinformatics. 2007 Jul 1;23(13):i41-8. doi: 10.1093/bioinformatics/btm229.
10
Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery.
Drug Discov Today. 2005 Nov 1;10(21):1475-82. doi: 10.1016/S1359-6446(05)03621-4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验