Arguello-Casteleiro Mercedes, Stevens Robert, Des-Diz Julio, Wroe Chris, Fernandez-Prieto Maria Jesus, Maroto Nava, Maseda-Fernandez Diego, Demetriou George, Peters Simon, Noble Peter-John M, Jones Phil H, Dukes-McEwan Jo, Radford Alan D, Keane John, Nenadic Goran
School of Computer Science, University of Manchester, Manchester, UK.
Hospital do Salnés, Villagarcía de Arousa, Pontevedra, Spain.
J Biomed Semantics. 2019 Nov 12;10(Suppl 1):22. doi: 10.1186/s13326-019-0212-6.
Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e. evidence-based resources) that are unavailable in an actionable or computable form. In this paper we explore how to turn information about diagnoses, prognoses, therapies and other clinical concepts into computable knowledge using free-text data about human and animal health. We used a Semantic Deep Learning approach that combines the Semantic Web technologies and Deep Learning to acquire and validate knowledge about 11 well-known medical conditions mined from two sets of unstructured free-text data: 300 K PubMed Systematic Review articles (the PMSB dataset) and 2.5 M veterinary clinical notes (the VetCN dataset). For each target condition we obtained 20 related clinical concepts using two deep learning methods applied separately on the two datasets, resulting in 880 term pairs (target term, candidate term). Each concept, represented by an n-gram, is mapped to UMLS using MetaMap; we also developed a bespoke method for mapping short forms (e.g. abbreviations and acronyms). Existing ontologies were used to formally represent associations. We also create ontological modules and illustrate how the extracted knowledge can be queried. The evaluation was performed using the content within BMJ Best Practice.
MetaMap achieves an F measure of 88% (precision 85%, recall 91%) when applied directly to the total of 613 unique candidate terms for the 880 term pairs. When the processing of short forms is included, MetaMap achieves an F measure of 94% (precision 92%, recall 96%). Validation of the term pairs with BMJ Best Practice yields precision between 98 and 99%.
The Semantic Deep Learning approach can transform neural embeddings built from unstructured free-text data into reliable and reusable One Health knowledge using ontologies and content from BMJ Best Practice.
深度学习为常规扫描大量生物医学文献和临床叙述以表示生物医学和临床术语的含义提供了机会。然而,要大规模验证和整合这些知识,需要与以可操作或可计算形式无法获取的基本事实(即基于证据的资源)进行交叉核对。在本文中,我们探讨如何利用关于人类和动物健康的自由文本数据,将有关诊断、预后、治疗和其他临床概念的信息转化为可计算的知识。我们采用了一种语义深度学习方法,该方法结合了语义网技术和深度学习,以获取和验证从两组非结构化自由文本数据中挖掘出的11种著名医疗状况的知识:30万篇PubMed系统评价文章(PMSB数据集)和250万篇兽医临床记录(VetCN数据集)。对于每个目标状况,我们使用分别应用于两个数据集的两种深度学习方法获得了20个相关临床概念,从而产生了880个术语对(目标术语,候选术语)。每个由n元语法表示的概念都使用MetaMap映射到UMLS;我们还开发了一种定制方法来映射缩写形式(例如缩写和首字母缩略词)。现有的本体用于正式表示关联。我们还创建了本体模块,并说明了如何查询提取的知识。使用BMJ最佳实践中的内容进行评估。
当直接应用于880个术语对的总共613个独特候选术语时,MetaMap的F值为88%(精确率85%,召回率91%)。当包括缩写形式的处理时,MetaMap的F值为94%(精确率92%,召回率96%)。用BMJ最佳实践对术语对进行验证,精确率在98%至99%之间。
语义深度学习方法可以利用BMJ最佳实践中的本体和内容,将从非结构化自由文本数据构建的神经嵌入转化为可靠且可重复使用的“同一健康”知识。