Xu Rong, Wang QuanQiu
Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, United States.
ThinTek, LLC, Palo Alto, CA 94306, United States.
J Biomed Inform. 2015 Feb;53:128-35. doi: 10.1016/j.jbi.2014.10.002. Epub 2014 Oct 13.
Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this study, we present a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles. The data consists of 31,255 tables downloaded from the Journal of Oncology (JCO). We first trained a statistical classifier to classify tables into SE-related and -unrelated categories. We then extracted drug-SE pairs from SE-related tables. We compared drug side effect knowledge extracted from JCO tables to that derived from FDA drug labels. Finally, we systematically analyzed relationships between anti-cancer drug-associated side effects and drug-associated gene targets, metabolism genes, and disease indications. The statistical table classifier is effective in classifying tables into SE-related and -unrelated (precision: 0.711; recall: 0.941; F1: 0.810). We extracted a total of 26,918 drug-SE pairs from SE-related tables with a precision of 0.605, a recall of 0.460, and a F1 of 0.520. Drug-SE pairs extracted from JCO tables is largely complementary to those derived from FDA drug labels; as many as 84.7% of the pairs extracted from JCO tables have not been included a side effect database constructed from FDA drug labels. Side effects associated with anticancer drugs positively correlate with drug target genes, drug metabolism genes, and disease indications.
抗癌药物相关的副作用知识通常存在于多个异构且互补的数据来源中。一个全面的抗癌药物 - 副作用(药物 - SE)关系知识库对于基于计算的药物靶点发现、药物毒性预测和药物重新定位至关重要。在本研究中,我们提出了一种两步法,通过结合表格分类和关系提取,从大量备受瞩目的肿瘤学全文文章中提取药物 - SE对。数据包括从《肿瘤学杂志》(JCO)下载的31,255个表格。我们首先训练了一个统计分类器,将表格分类为与SE相关和不相关的类别。然后,我们从与SE相关的表格中提取药物 - SE对。我们将从JCO表格中提取的药物副作用知识与从FDA药物标签中获得的知识进行了比较。最后,我们系统地分析了抗癌药物相关副作用与药物相关基因靶点、代谢基因和疾病适应症之间的关系。统计表格分类器在将表格分类为与SE相关和不相关方面是有效的(精确率:0.711;召回率:0.941;F1值:0.810)。我们从与SE相关的表格中总共提取了26,918个药物 - SE对,精确率为0.605,召回率为0.460,F1值为0.520。从JCO表格中提取的药物 - SE对在很大程度上与从FDA药物标签中获得的对互补;从JCO表格中提取的对中多达84.7%未包含在由FDA药物标签构建的副作用数据库中。与抗癌药物相关的副作用与药物靶点基因、药物代谢基因和疾病适应症呈正相关。