第十六章：转化生物信息学中的文本挖掘。

Chapter 16: text mining for translational bioinformatics.

机构信息

Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA.

出版信息

PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.

DOI:10.1371/journal.pcbi.1003044

PMID:23633944

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3635962/

Abstract

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

摘要

文本挖掘在转化生物信息学中是一个具有巨大研究潜力的新领域。它是生物医学自然语言处理的一个子领域，直接关注将基础生物医学研究与临床实践联系起来的问题，反之亦然。文本挖掘的应用既属于转化研究 1（将基础科学成果转化为新的干预措施），也属于转化研究 2（或转化为公共卫生的研究）。潜在的用例包括更好地对研究对象进行表型分析，以及药物基因组学研究。评估文本挖掘应用的方法有很多种，包括语料库、结构化测试套件和事后判断。构建文本挖掘应用程序与两个基本的语言结构原则相关。一个是语言结构由多个层次组成。另一个是语言结构的每个层次都具有模糊性。文本挖掘有两种基本方法：基于规则的，也称为基于知识的；以及基于机器学习的，也称为基于统计的。许多系统是这两种方法的混合。共享任务对该领域的发展方向产生了重大影响。与所有转化生物信息学软件一样，转化生物信息学的文本挖掘软件可以被认为是对健康至关重要的，应该遵守最严格的质量保证和软件测试标准。

相似文献

Chapter 16: text mining for translational bioinformatics.第十六章：转化生物信息学中的文本挖掘。

PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.

Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

Zsyntax: a formal language for molecular biology with projected applications in text mining and biological prediction.Zsyntax：一种用于分子生物学的形式语言，预计可应用于文本挖掘和生物预测。

PLoS One. 2010 Mar 3;5(3):e9511. doi: 10.1371/journal.pone.0009511.

A survey on annotation tools for the biomedical literature.一份关于生物医学文献注释工具的调查。

Brief Bioinform. 2014 Mar;15(2):327-40. doi: 10.1093/bib/bbs084. Epub 2012 Dec 18.

A Guide to Dictionary-Based Text Mining.基于词典的文本挖掘指南。

Methods Mol Biol. 2019;1939:73-89. doi: 10.1007/978-1-4939-9089-4_5.

Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述

Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.

Community challenges in biomedical text mining over 10 years: success, failure and the future.十年来生物医学文本挖掘中的社区挑战：成功、失败与未来。

Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1.

Feature selection methods for big data bioinformatics: A survey from the search perspective.大数据生物信息学中的特征选择方法：基于搜索视角的综述

Methods. 2016 Dec 1;111:21-31. doi: 10.1016/j.ymeth.2016.08.014. Epub 2016 Aug 31.

Introducing Machine Learning Concepts with WEKA.使用WEKA介绍机器学习概念。

Methods Mol Biol. 2016;1418:353-78. doi: 10.1007/978-1-4939-3578-9_17.

Biomarker identification using text mining.使用文本挖掘进行生物标志物识别。

Comput Math Methods Med. 2012;2012:135780. doi: 10.1155/2012/135780. Epub 2012 Nov 11.

引用本文的文献

A Web Application for Biomedical Text Mining of Scientific Literature Associated with Coronavirus-Related Syndromes: Coronavirus Finder.用于与冠状病毒相关综合征的科学文献生物医学文本挖掘的网络应用程序：冠状病毒查找器

Diagnostics (Basel). 2022 Apr 2;12(4):887. doi: 10.3390/diagnostics12040887.

BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature.BEST：用于从生物医学文献中进行知识发现的下一代生物医学实体搜索工具。

PLoS One. 2016 Oct 19;11(10):e0164680. doi: 10.1371/journal.pone.0164680. eCollection 2016.

The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge.用于生物创意/化学命名实体识别挑战赛中化学和基因实体识别的Markyt可视化、预测和基准测试平台。

Database (Oxford). 2016 Aug 19;2016. doi: 10.1093/database/baw120. Print 2016.

pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.pubmed.mineR：一个带有文本挖掘算法的R包，用于分析PubMed摘要。

J Biosci. 2015 Oct;40(4):671-82. doi: 10.1007/s12038-015-9552-2.

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.在从自由文本生物医学文献中大规模提取药物-副作用关系方面，将知识驱动方法与监督式机器学习方法进行比较。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.从生物医学文献中自动构建大规模且准确的药物-副作用关联知识库。

J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.

Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research.青少年特发性关节炎慢性葡萄膜炎风险因素分析：基于电子病历的研究的新模型。

Pediatr Rheumatol Online J. 2013 Dec 3;11(1):45. doi: 10.1186/1546-0096-11-45.

本文引用的文献

BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.生物词元化器：一种用于生物医学文本形态处理的词元化工具。

J Biomed Semantics. 2012 Apr 1;3:3. doi: 10.1186/2041-1480-3-3.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛：临床文本中的概念、断言和关系

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

Pharmacogenomics: "noninferiority" is sufficient for initial implementation.药物基因组学：“非劣效性”足以进行初步实施。

Clin Pharmacol Ther. 2011 Mar;89(3):348-50. doi: 10.1038/clpt.2010.310.

The structural and content aspects of abstracts versus bodies of full text journal articles are different.文摘的结构和内容方面与全文期刊文章的不同。

BMC Bioinformatics. 2010 Sep 29;11:492. doi: 10.1186/1471-2105-11-492.

Extracting medication information from clinical text.从临床文本中提取药物信息。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.

Exploring species-based strategies for gene normalization.探索基于物种的基因标准化策略。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):462-71. doi: 10.1109/TCBB.2010.48.

An overview of MetaMap: historical perspective and recent advances.MetaMap 概述：历史视角与最新进展。

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD).文本挖掘和化学-基因-疾病网络的人工整理用于比较毒理学基因组数据库（CTD）。

BMC Bioinformatics. 2009 Oct 8;10:326. doi: 10.1186/1471-2105-10-326.

Molecular phenotypes distinguish patients with relatively stable from progressive idiopathic pulmonary fibrosis (IPF).分子表型可区分特发性肺纤维化（IPF）病情相对稳定和病情进展的患者。

PLoS One. 2009;4(4):e5134. doi: 10.1371/journal.pone.0005134. Epub 2009 Apr 6.

Concept recognition for extracting protein interaction relations from biomedical text.从生物医学文本中提取蛋白质相互作用关系的概念识别

Genome Biol. 2008;9 Suppl 2(Suppl 2):S9. doi: 10.1186/gb-2008-9-s2-s9. Epub 2008 Sep 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验