Biomedical Informatics, Stanford University, Stanford, CA 94305, USA.
Pharmacogenomics. 2010 Oct;11(10):1467-89. doi: 10.2217/pgs.10.136.
The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications.
生物医学文献承载着我们对药物基因组学的理解,但它分散在许多期刊中。为了整合我们的知识,在出版物之间建立联系,并生成新的假设,我们必须对文献的内容进行组织和编码。通过创建结构化药物基因组学知识库,我们可以使文献的价值远远超过各个报告的总和。例如,我们可以生成候选基因列表,或解释全基因组关联研究中的惊人结果。文本挖掘自动为隐藏在数百万篇文献中的非结构化知识添加结构,近年来,生物医学文本挖掘工作大量涌现,其中一些专门针对药物基因组学文献。这些方法不仅能够提取特定类型的信息,还可以为一般性、系统性查询提供答案。在本文中,我们将描述在药物基因组学背景下文本挖掘的主要任务,总结最近的应用,并预测文本挖掘应用的下一阶段。