Sorbonne Université, Université Sorbonne Paris Nord, INSERM, LIMICS, Paris, France.
AP-HP, Henri Mondor, Department of Medical Oncology, Creteil, France.
Stud Health Technol Inform. 2024 Aug 22;316:1861-1865. doi: 10.3233/SHTI240794.
Using clinical decision support systems (CDSSs) for breast cancer management necessitates to extract relevant patient data from textual reports which is a complex task although efficiently achieved by machine learning but black box methods. We proposed a rule-based natural language processing (NLP) method to automate the translation of breast cancer patient summaries into structured patient profiles suitable for input into the guideline-based CDSS of the DESIREE project. Our method encompasses named entity recognition (NER), relation extraction and structured data extraction to systematically organize patient data. The method demonstrated strong alignment with treatment recommendations generated for manually created patient profiles (gold standard) with only 2% of differences. Moreover, the NER pipeline achieved an average F1-score of 0.9 across the main entities (patient, side, and tumor), of 0,87 for relation extraction, and 0.75 for contextual information, showing promising results for rule-based NLP.
使用临床决策支持系统 (CDSS) 进行乳腺癌管理需要从文本报告中提取相关的患者数据,这是一项复杂的任务,尽管机器学习可以有效地完成,但这是一种黑盒方法。我们提出了一种基于规则的自然语言处理 (NLP) 方法,将乳腺癌患者摘要自动转换为适合输入 DESIREE 项目基于指南的 CDSS 的结构化患者档案。我们的方法包括命名实体识别 (NER)、关系提取和结构化数据提取,以系统地组织患者数据。该方法与为手动创建的患者档案(黄金标准)生成的治疗建议具有很强的一致性,仅存在 2%的差异。此外,NER 管道在主要实体(患者、侧和肿瘤)上的平均 F1 得分为 0.9,关系提取为 0.87,上下文信息为 0.75,这表明基于规则的 NLP 具有很有前途的结果。