Computer Science and Technology Department, Donghua University, Shanghai, China.
J Biomed Inform. 2023 Jun;142:104371. doi: 10.1016/j.jbi.2023.104371. Epub 2023 May 5.
Accurate and efficient extraction of key information related to diseases from medical examination reports, such as X-ray and ultrasound images, CT scans, and others, is crucial for accurate diagnosis and treatment. These reports provide a detailed record of a patient's health condition and are an important part of the clinical examination process. By organizing this information in a structured way, doctors can more easily review and analyze the data, leading to better patient care. In this paper, we introduce a new technique for extracting useful information from unstructured clinical text examination reports, which we refer to as a medical event extraction (EE) task. Our approach is based on Machine Reading Comprehension (MRC) and involves two sub-tasks: Question Answerability Judgment (QAJ) and Span Selection (SS). We use BERT to build a question answerability discriminator (Judger) that determines whether a reading comprehension question can be answered or not, thereby avoiding the extraction of arguments from unanswerable questions. The SS sub-task first obtains the encoding of each word in the medical text from the final layer of BERT's Transformer, then utilizes the attention mechanism to identify important information related to the answer from these word encodings. This information is then input into a bidirectional LSTM (BiLSTM) module to obtain a global representation of the text, which is used, along with the softmax function, to predict the span of the answer (i.e., the start and end positions of the answer in the text report). We use interpretable methods to calculate the Jensen-Shannon Divergence (JSD) score between various layers of the network and confirm that our model has strong word representation capabilities, enabling it to effectively extract contextual information from medical reports. Our experiments demonstrate that our method outperforms existing medical event extraction methods, achieving state-of-the-art results with a notable F1 score.
从医学检查报告(如 X 光和超声图像、CT 扫描等)中准确高效地提取与疾病相关的关键信息对于准确诊断和治疗至关重要。这些报告详细记录了患者的健康状况,是临床检查过程的重要组成部分。通过以结构化的方式组织这些信息,医生可以更轻松地审查和分析数据,从而提供更好的患者护理。在本文中,我们介绍了一种从非结构化临床文本检查报告中提取有用信息的新技术,我们称之为医疗事件提取(EE)任务。我们的方法基于机器阅读理解(MRC),包括两个子任务:问题可回答性判断(QAJ)和跨度选择(SS)。我们使用 BERT 构建了一个问题可回答性判别器(Judger),该判别器确定阅读理解问题是否可以回答,从而避免从不可回答的问题中提取参数。SS 子任务首先从 BERT 的 Transformer 的最后一层获取医疗文本中每个单词的编码,然后利用注意力机制从这些单词编码中识别与答案相关的重要信息。然后,将该信息输入到双向 LSTM(BiLSTM)模块中,以获取文本的全局表示,然后与 softmax 函数一起使用,以预测答案的跨度(即文本报告中答案的起始和结束位置)。我们使用可解释的方法计算网络各层之间的 Jensen-Shannon 散度(JSD)得分,并确认我们的模型具有强大的单词表示能力,能够有效地从医疗报告中提取上下文信息。我们的实验表明,我们的方法优于现有的医疗事件提取方法,在 F1 得分方面取得了显著的效果。