Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA.
PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
文本挖掘在转化生物信息学中是一个具有巨大研究潜力的新领域。它是生物医学自然语言处理的一个子领域,直接关注将基础生物医学研究与临床实践联系起来的问题,反之亦然。文本挖掘的应用既属于转化研究 1(将基础科学成果转化为新的干预措施),也属于转化研究 2(或转化为公共卫生的研究)。潜在的用例包括更好地对研究对象进行表型分析,以及药物基因组学研究。评估文本挖掘应用的方法有很多种,包括语料库、结构化测试套件和事后判断。构建文本挖掘应用程序与两个基本的语言结构原则相关。一个是语言结构由多个层次组成。另一个是语言结构的每个层次都具有模糊性。文本挖掘有两种基本方法:基于规则的,也称为基于知识的;以及基于机器学习的,也称为基于统计的。许多系统是这两种方法的混合。共享任务对该领域的发展方向产生了重大影响。与所有转化生物信息学软件一样,转化生物信息学的文本挖掘软件可以被认为是对健康至关重要的,应该遵守最严格的质量保证和软件测试标准。