Marchesin Stefano, Giachelle Fabio, Marini Niccolò, Atzori Manfredo, Boytcheva Svetla, Buttafuoco Genziana, Ciompi Francesco, Di Nunzio Giorgio Maria, Fraggetta Filippo, Irrera Ornella, Müller Henning, Primov Todor, Vatrano Simona, Silvello Gianmaria
Department of Information Engineering, University of Padua, Padua, Italy.
Information Systems Institute, University of Applied Sciences Western Switzerland, Delémont, Switzerland.
J Pathol Inform. 2022 Sep 15;13:100139. doi: 10.1016/j.jpi.2022.100139. eCollection 2022.
Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding medical knowledge that is still largely unexploited. In order to allow decoding medical knowledge included in reports, we propose an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET). Combining rule-based techniques and pre-trained ML models provides high accuracy results for knowledge extraction. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning. SKET is a practical and unsupervised approach to extracting knowledge from pathology reports, which opens up unprecedented opportunities to exploit textual and multimodal medical information in clinical practice. We also propose SKET eXplained (SKET X), a web-based system providing visual explanations about the algorithmic decisions taken by SKET. SKET X is designed/developed to support pathologists and domain experts in understanding SKET predictions, possibly driving further improvements to the system.
几十年来,已经产生了百亿亿次规模的医学数据。在大多数情况下,诊断结果是以自由文本形式报告的,其中编码的医学知识在很大程度上仍未得到充分利用。为了能够解读报告中包含的医学知识,我们提出了一种无监督知识提取系统,该系统将基于规则的专家系统与预训练的机器学习(ML)模型相结合,即语义知识提取工具(SKET)。将基于规则的技术与预训练的ML模型相结合,可为知识提取提供高精度的结果。这项工作证明了无监督自然语言处理(NLP)技术从癌症报告中提取关键信息的可行性,为诸如出于知识提取目的的数据挖掘、精准医学应用、结构化报告创建和多模态学习等开辟了机会。SKET是一种从病理报告中提取知识的实用且无监督的方法,为在临床实践中利用文本和多模态医学信息开辟了前所未有的机会。我们还提出了SKET解释系统(SKET X),这是一个基于网络的系统,可提供有关SKET做出的算法决策的可视化解释。SKET X的设计/开发目的是支持病理学家和领域专家理解SKET的预测,可能推动对该系统的进一步改进。