Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA.
Int J Med Inform. 2013 Sep;82(9):821-31. doi: 10.1016/j.ijmedinf.2013.03.005. Epub 2013 Apr 30.
We describe an experiment to build a de-identification system for clinical records using the open source MITRE Identification Scrubber Toolkit (MIST). We quantify the human annotation effort needed to produce a system that de-identifies at high accuracy.
Using two types of clinical records (history and physical notes, and social work notes), we iteratively built statistical de-identification models by annotating 10 notes, training a model, applying the model to another 10 notes, correcting the model's output, and training from the resulting larger set of annotated notes. This was repeated for 20 rounds of 10 notes each, and then an additional 6 rounds of 20 notes each, and a final round of 40 notes. At each stage, we measured precision, recall, and F-score, and compared these to the amount of annotation time needed to complete the round.
After the initial 10-note round (33min of annotation time) we achieved an F-score of 0.89. After just over 8h of annotation time (round 21) we achieved an F-score of 0.95. Number of annotation actions needed, as well as time needed, decreased in later rounds as model performance improved. Accuracy on history and physical notes exceeded that of social work notes, suggesting that the wider variety and contexts for protected health information (PHI) in social work notes is more difficult to model.
It is possible, with modest effort, to build a functioning de-identification system de novo using the MIST framework. The resulting system achieved performance comparable to other high-performing de-identification systems.
我们描述了一个使用开源的麻省理工学院识别清洗工具包(MIST)构建临床记录去识别系统的实验。我们量化了产生高精度去识别系统所需的人工注释工作。
使用两种类型的临床记录(病史和体检记录,以及社会工作记录),我们通过注释 10 个记录、训练一个模型、将模型应用于另外 10 个记录、纠正模型的输出以及从更大的注释记录集中训练来迭代地构建统计去识别模型。这一过程重复了 20 轮,每轮 10 个记录,然后又进行了 6 轮,每轮 20 个记录,最后一轮是 40 个记录。在每个阶段,我们测量了精度、召回率和 F 分数,并将这些与完成轮次所需的注释时间进行了比较。
在初始的 10 个记录轮(33 分钟的注释时间)之后,我们实现了 0.89 的 F 分数。在 8 个多小时的注释时间(第 21 轮)之后,我们实现了 0.95 的 F 分数。随着模型性能的提高,注释操作的数量和所需的时间都在后期轮次中减少。病史和体检记录的准确性高于社会工作记录,这表明社会工作记录中保护健康信息(PHI)的种类更多,上下文更复杂,更难建模。
使用 MIST 框架,适度努力就有可能从头开始构建功能齐全的去识别系统。所得到的系统的性能可与其他高性能去识别系统相媲美。