Du Jian, Li Xiaoying
National Institute of Health Data Science, Peking University, Beijing, China.
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China.
JMIR Med Inform. 2020 Apr 28;8(4):e18323. doi: 10.2196/18323.
Combination therapy plays an important role in the effective treatment of malignant neoplasms and precision medicine. Numerous clinical studies have been carried out to investigate combination drug therapies. Automated knowledge discovery of these combinations and their graphic representation in knowledge graphs will enable pattern recognition and identification of drug combinations used to treat a specific type of cancer, improve drug efficacy and treatment of human disorders.
This paper aims to develop an automated, visual approach to discover knowledge about combination therapies from biomedical literature, especially from those studies with high-level evidence such as clinical trial reports and clinical practice guidelines.
Based on semantic predications, which consist of a triple structure of subject-predicate-object (SPO), we proposed an automated algorithm to discover knowledge of combination drug therapies using the following rules: 1) two or more semantic predications (S-P-O and S-P-O, i = 2, 3…) can be extracted from one conclusive claim (sentence) in the abstract of a given publication, and 2) these predications have an identical predicate (that closely relates to human disease treatment, eg, "treat") and object (eg, disease name) but different subjects (eg, drug names). A customized knowledge graph organizes and visualizes these combinations, improving the traditional semantic triples. After automatic filtering of broad concepts such as "pharmacologic actions" and generic disease names, a set of combination drug therapies were identified and characterized through manual interpretation.
We retrieved 22,263 clinical trial reports and 31 clinical practice guidelines from PubMed abstracts by searching "antineoplastic agents" for drug restriction (published between Jan 2009 and Oct 2019). There were 15,603 conclusive claims locally parsed using the search terms "conclusion*" and "conclude*" ready for semantic predications extraction by SemRep, and 325 candidate groups of semantic predications about combined medications were automatically discovered within 316 conclusive claims. Based on manual analysis, we determined that 255/316 claims (78.46%) were accurately identified as describing combination therapies and adopted these to construct the customized knowledge graph. We also identified two categories (and 4 subcategories) to characterize the inaccurate results: limitations of SemRep and limitations of proposal. We further learned the predominant patterns of drug combinations based on mechanism of action for new combined medication studies and discovered 4 obvious markers ("combin*," "coadministration," "co-administered," and "regimen") to identify potential combination therapies to enable development of a machine learning algorithm.
Semantic predications from conclusive claims in the biomedical literature can be used to support automated knowledge discovery and knowledge graph construction for combination therapies. A machine learning approach is warranted to take full advantage of the identified markers and other contextual features.
联合治疗在恶性肿瘤的有效治疗和精准医学中发挥着重要作用。已经开展了大量临床研究来探究联合药物治疗。对这些联合用药进行自动化知识发现并在知识图谱中进行图形化表示,将有助于模式识别以及识别用于治疗特定类型癌症的药物组合,提高药物疗效并改善人类疾病的治疗效果。
本文旨在开发一种自动化的可视化方法,从生物医学文献,尤其是从那些具有高级别证据的研究(如临床试验报告和临床实践指南)中发现有关联合治疗的知识。
基于由主语 - 谓语 - 宾语(SPO)三元结构组成的语义谓词,我们提出了一种自动化算法,使用以下规则来发现联合药物治疗的知识:1)可以从给定出版物摘要中的一个结论性声明(句子)中提取两个或更多语义谓词(S - P - O和S - P - O,i = 2, 3…),并且2)这些谓词具有相同的谓语(与人类疾病治疗密切相关,例如“治疗”)和宾语(例如疾病名称),但主语不同(例如药物名称)。一个定制的知识图谱对这些组合进行组织和可视化,改进了传统的语义三元组。在自动过滤诸如“药理作用”等宽泛概念和通用疾病名称后,通过人工解读确定并表征了一组联合药物治疗。
我们通过在PubMed摘要中搜索“抗肿瘤药”进行药物限制(发表于2009年1月至2019年10月之间),检索到22,263份临床试验报告和31份临床实践指南。使用搜索词“conclusion*”和“conclude*”对15,603个结论性声明进行了本地解析,准备由SemRep提取语义谓词,并且在316个结论性声明中自动发现了325个关于联合用药的候选语义谓词组。基于人工分析,我们确定316个声明中的255个(78.46%)被准确识别为描述联合治疗,并采用这些声明来构建定制的知识图谱。我们还确定了两类(以及4个子类)来表征不准确的结果:SemRep的局限性和提议的局限性。我们进一步基于新联合用药研究的作用机制了解了药物组合的主要模式,并发现了4个明显的标记(“combin*”、“coadministration”、“co - administered”和“regimen”)来识别潜在的联合治疗,以开发机器学习算法。
生物医学文献中结论性声明的语义谓词可用于支持联合治疗的自动化知识发现和知识图谱构建。有必要采用机器学习方法来充分利用已识别的标记和其他上下文特征。