Garten Yael, Tatonetti Nicholas P, Altman Russ B
Stanford Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA.
Pac Symp Biocomput. 2010:305-14. doi: 10.1142/9789814295291_0033.
A critical goal of pharmacogenomics research is to identify genes that can explain variation in drug response. We have previously reported a method that creates a genome-scale ranking of genes likely to interact with a drug. The algorithm uses information about drug structure and indications of use to rank the genes. Although the algorithm has good performance, its performance depends on a curated set of drug-gene relationships that is expensive to create and difficult to maintain. In this work, we assess the utility of text mining in extracting a network of drug-gene relationships automatically. This provides a valuable aggregate source of knowledge, subsequently used as input into the algorithm that ranks potential pharmacogenes. Using a drug-gene network created from sentence-level co-occurrence in the full text of scientific articles, we compared the performance to that of a network created by manual curation of those articles. Under a wide range of conditions, we show that a knowledge base derived from text-mining the literature performs as well as, and sometimes better than, a high-quality, manually curated knowledge base. We conclude that we can use relationships mined automatically from the literature as a knowledgebase for pharmacogenomics relationships. Additionally, when relationships are missed by text mining, our system can accurately extrapolate new relationships with 77.4% precision.
药物基因组学研究的一个关键目标是识别能够解释药物反应差异的基因。我们之前报道了一种方法,该方法可对可能与药物相互作用的基因进行全基因组规模的排序。该算法利用药物结构和使用指征信息对基因进行排序。尽管该算法性能良好,但其性能依赖于一组精心整理的药物-基因关系,而创建和维护这组关系成本高昂且难度较大。在这项工作中,我们评估了文本挖掘在自动提取药物-基因关系网络方面的效用。这提供了一个有价值的知识汇总来源,随后用作对潜在药物基因进行排序的算法的输入。利用从科学文章全文中的句子级共现创建的药物-基因网络,我们将其性能与通过人工整理这些文章创建的网络进行了比较。在广泛的条件下,我们表明从文献文本挖掘得出的知识库表现与高质量的人工整理知识库相当,有时甚至更好。我们得出结论,我们可以将从文献中自动挖掘的关系用作药物基因组学关系的知识库。此外,当文本挖掘遗漏关系时,我们的系统能够以77.4%的精度准确推断出新的关系。