García-Barragán Álvaro, Sakor Ahmad, Vidal Maria-Esther, Menasalvas Ernestina, Gonzalez Juan Cristobal Sanchez, Provencio Mariano, Robles Víctor
Center of Biomedical Technology, Universidad Politécnica de Madrid, Campus Montegancedo, Pozuelo de Alarcón, 28223, Madrid, Spain.
Data Science Institute, Leibniz University of Hannover, Welfengarten 1, Hannover, 30060, Lower Saxony, Germany.
Med Biol Eng Comput. 2025 Mar;63(3):749-772. doi: 10.1007/s11517-024-03227-4. Epub 2024 Nov 1.
Accurate recognition and linking of oncologic entities in clinical notes is essential for extracting insights across cancer research, patient care, clinical decision-making, and treatment optimization. We present the Neuro-Symbolic System for Cancer (NSSC), a hybrid AI framework that integrates neurosymbolic methods with named entity recognition (NER) and entity linking (EL) to transform unstructured clinical notes into structured terms using medical vocabularies, with the Unified Medical Language System (UMLS) as a case study. NSSC was evaluated on a dataset of clinical notes from breast cancer patients, demonstrating significant improvements in the accuracy of both entity recognition and linking compared to state-of-the-art models. Specifically, NSSC achieved a 33% improvement over BioFalcon and a 58% improvement over scispaCy. By combining large language models (LLMs) with symbolic reasoning, NSSC improves the recognition and interoperability of oncologic entities, enabling seamless integration with existing biomedical knowledge. This approach marks a significant advancement in extracting meaningful information from clinical narratives, offering promising applications in cancer research and personalized patient care.
准确识别和关联临床记录中的肿瘤实体对于在癌症研究、患者护理、临床决策和治疗优化中提取有价值的信息至关重要。我们提出了癌症神经符号系统(NSSC),这是一个混合人工智能框架,它将神经符号方法与命名实体识别(NER)和实体链接(EL)相结合,以使用医学词汇将非结构化临床记录转换为结构化术语,并以统一医学语言系统(UMLS)作为案例研究。在乳腺癌患者临床记录数据集上对NSSC进行了评估,结果表明与最先进的模型相比,实体识别和链接的准确性都有显著提高。具体而言,NSSC比BioFalcon提高了33%,比scispaCy提高了58%。通过将大语言模型(LLMs)与符号推理相结合,NSSC提高了肿瘤实体的识别和互操作性,实现了与现有生物医学知识的无缝集成。这种方法在从临床叙述中提取有意义信息方面取得了重大进展,在癌症研究和个性化患者护理中具有广阔的应用前景。