El-Hayek Carol, Barzegar Siamak, Faux Noel, Doyle Kim, Pillai Priyanka, Mutch Simon J, Vaisey Alaina, Ward Roger, Sanci Lena, Dunn Adam G, Hellard Margaret E, Hocking Jane S, Verspoor Karin, Boyle Douglas Ir
Burnet Institute, Melbourne, Australia; Melbourne School of Population and Global Health, University of Melbourne, Australia; School of Public Health and Preventive Medicine, Monash University, Australia.
School of Computing and Information Systems, University of Melbourne, Australia.
Int J Med Inform. 2023 May;173:105021. doi: 10.1016/j.ijmedinf.2023.105021. Epub 2023 Feb 11.
Digitized patient progress notes from general practice represent a significant resource for clinical and public health research but cannot feasibly and ethically be used for these purposes without automated de-identification. Internationally, several open-source natural language processing tools have been developed, however, given wide variations in clinical documentation practices, these cannot be utilized without appropriate review. We evaluated the performance of four de-identification tools and assessed their suitability for customization to Australian general practice progress notes.
Four tools were selected: three rule-based (HMS Scrubber, MIT De-id, Philter) and one machine learning (MIST). 300 patient progress notes from three general practice clinics were manually annotated with personally identifying information. We conducted a pairwise comparison between the manual annotations and patient identifiers automatically detected by each tool, measuring recall (sensitivity), precision (positive predictive value), f1-score (harmonic mean of precision and recall), and f2-score (weighs recall 2x higher than precision). Error analysis was also conducted to better understand each tool's structure and performance.
Manual annotation detected 701 identifiers in seven categories. The rule-based tools detected identifiers in six categories and MIST in three. Philter achieved the highest aggregate recall (67%) and the highest recall for NAME (87%). HMS Scrubber achieved the highest recall for DATE (94%) and all tools performed poorly on LOCATION. MIST achieved the highest precision for NAME and DATE while also achieving similar recall to the rule-based tools for DATE and highest recall for LOCATION. Philter had the lowest aggregate precision (37%), however preliminary adjustments of its rules and dictionaries showed a substantial reduction in false positives.
Existing off-the-shelf solutions for automated de-identification of clinical text are not immediately suitable for our context without modification. Philter is the most promising candidate due to its high recall and flexibility however will require extensive revising of its pattern matching rules and dictionaries.
来自全科医疗的数字化患者病程记录是临床和公共卫生研究的重要资源,但未经自动去识别处理,就无法在实际操作和伦理层面用于这些目的。在国际上,已经开发了几种开源自然语言处理工具,然而,鉴于临床文档记录做法差异很大,在未经适当审查的情况下无法使用这些工具。我们评估了四种去识别工具的性能,并评估了它们针对澳大利亚全科医疗病程记录进行定制的适用性。
选择了四种工具:三种基于规则的工具(HMS Scrubber、麻省理工学院去识别工具、Philter)和一种机器学习工具(MIST)。从三个全科医疗诊所收集了300份患者病程记录,并手动标注了个人识别信息。我们对人工标注与每个工具自动检测到的患者标识符进行了两两比较,测量召回率(敏感性)、精确率(阳性预测值)、F1分数(精确率和召回率的调和平均值)和F2分数(召回率的权重是精确率的两倍)。还进行了错误分析,以更好地了解每个工具的结构和性能。
人工标注在七个类别中检测到701个标识符。基于规则的工具在六个类别中检测到标识符,而MIST在三个类别中检测到标识符。Philter实现了最高的总体召回率(67%)和最高的姓名召回率(87%)。HMS Scrubber实现了最高的日期召回率(94%),所有工具在地点方面的表现都很差。MIST在姓名和日期方面实现了最高的精确率,同时在日期方面的召回率与基于规则的工具相似,在地点方面实现了最高的召回率。Philter的总体精确率最低(37%),但其规则和词典的初步调整显示误报大幅减少。
现有的现成临床文本自动去识别解决方案未经修改不能立即适用于我们的情况。Philter因其高召回率和灵活性而成为最有前途的候选工具,但其模式匹配规则和词典需要大量修订。