Morrison Frances P, Li Li, Lai Albert M, Hripcsak George
Columbia University Department of Biomedical Informatics, New York, NY, USA.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):37-9. doi: 10.1197/jamia.M2862. Epub 2008 Oct 24.
Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
电子临床文档对于公共卫生监测、质量改进和研究等活动可能很有用,但现有的去识别方法可能无法充分保护患者数据。通用自然语言处理器MedLEE保留医学概念,同时排除其余文本,因此,除了将文本处理为结构化数据外,它还可能提供去识别的次要好处。在不修改系统的情况下,作者通过将100份门诊临床记录与相应的XML标记输出进行比较,测试了MedLEE去除受保护健康信息(PHI)的能力。在809个PHI实例中,有26个(3.2%)由于处理和识别错误在输出中被检测到。然而,输出中的PHI经过了高度转换,许多表现为医学概念的标准化术语,这可能使重新识别更加困难。MedLEE处理器可能是对其他去识别系统的一个很好的增强,既能去除PHI,又能从临床文本中提供编码数据。