Carolina Health Informatics Program, University of North Carolina at Chapel Hill, NC.
AMIA Annu Symp Proc. 2023 Apr 29;2022:349-358. eCollection 2022.
In this paper, a new cohort identification system that exploits the semantic hierarchy of SNOMED CT is proposed to overcome the limitations of supervised machine learning-based approaches. Eligibility criteria descriptions and free-text clinical notes from the 2018 National NLP Clinical Challenge (n2c2) were processed to map to relevant SNOMED CT concepts and to measure semantic similarity between the eligibility criteria and patients. The eligibility of a patient was determined if the patient had a similarity score higher than a threshold cut-off value. The performance of the proposed system was evaluated for three eligibility criteria. The performance of the current system exceeded the previously reported results of the 2018 n2c2, achieving the average F1 score of 0.933. This study demonstrated that SNOMED CT alone can be leveraged for cohort identification tasks without referring to external textual sources for training.
本文提出了一种新的队列识别系统,该系统利用了 SNOMED CT 的语义层次结构,以克服基于监督机器学习的方法的局限性。对 2018 年国家自然语言处理临床挑战(n2c2)的合格标准描述和自由文本临床记录进行处理,以映射到相关的 SNOMED CT 概念,并测量合格标准和患者之间的语义相似性。如果患者的相似性得分高于阈值截止值,则确定患者的合格性。针对三个合格标准评估了所提出系统的性能。当前系统的性能超过了 2018 年 n2c2 先前报告的结果,平均 F1 得分为 0.933。这项研究表明,无需参考外部文本源进行训练,仅使用 SNOMED CT 就可以用于队列识别任务。