Xu Rong, Wang Quanqiu
Medical Informatics Division, Case Western Reserve University, Cleveland, OH 44106, USA.
Int J Comput Biol Drug Des. 2013;6(1-2):18-31. doi: 10.1504/IJCBDD.2013.052199. Epub 2013 Feb 21.
Pharmacogenomics (PGx) studies are to identify genetic variants that may affect drug efficacy and toxicity. A machine understandable drug-gene relationship knowledge is important for many computational PGx studies and for personalised medicine. A comprehensive and accurate PGx-specific gene lexicon is important for automatic drug-gene relationship extraction from the scientific literature, rich knowledge source for PGx studies. In this study, we present a bootstrapping learning technique to rank 33,310 human genes with respect to their relevance to drug response. The algorithm uses only one seed PGx gene to iteratively extract and rank co-occurred genes using 20 million MEDLINE abstracts. Our ranking method is able to accurately rank PGx-specific genes highly among all human genes. Compared to randomly ranked genes (precision: 0.032, recall: 0.013, F1: 0.018), the algorithm has achieved significantly better performance (precision: 0.861, recall: 0.548, F1: 0.662) in ranking the top 2.5% of genes.
药物基因组学(PGx)研究旨在识别可能影响药物疗效和毒性的基因变异。对于许多计算药物基因组学研究和个性化医疗而言,机器可理解的药物-基因关系知识至关重要。一个全面且准确的特定于药物基因组学的基因词汇表,对于从科学文献(药物基因组学研究的丰富知识来源)中自动提取药物-基因关系非常重要。在本研究中,我们提出了一种自训练学习技术,以根据33310个人类基因与药物反应的相关性对其进行排序。该算法仅使用一个种子药物基因组学基因,通过2000万篇医学文献摘要来迭代提取并对共现基因进行排序。我们的排序方法能够在所有人类基因中准确地将特定于药物基因组学的基因排在高位。与随机排序的基因相比(精确率:0.032,召回率:0.013,F1值:0.018),该算法在对排名前2.5%的基因进行排序时取得了显著更好的性能(精确率:0.861,召回率:0.548,F1值:0.662)。