Suppr超能文献

加拿大安大略省初级保健电子病历自由文本数据的去识别化。

De-identification of primary care electronic medical records free-text data in Ontario, Canada.

机构信息

Institute for Clinical Evaluative Sciences G106, Toronto, Ontario, M4N 3M5, Canada.

出版信息

BMC Med Inform Decis Mak. 2010 Jun 18;10:35. doi: 10.1186/1472-6947-10-35.

Abstract

BACKGROUND

Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data

METHODS

We used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.

RESULTS

We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.

CONCLUSION

The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.

摘要

背景

电子病历(EMR)代表了一种潜在的丰富的健康信息来源,但 EMR 中的自由文本通常包含可识别信息。虽然已经开发了用于自由文本的去识别工具,但没有一个工具是针对整个初级保健 EMR 数据范围开发或测试的。

方法

我们使用了 deid 开源去识别软件,并对其进行了修改,使其适用于安大略省的初级保健 EMR 数据。我们在一个组实践的 1000 份自由文本记录的训练集中开发了修改后的程序,然后在来自 7 个不同城市的 5 个不同实践的 17 个不同医生的 700 份自由文本 EMR 记录的两个验证集中测试了它,以及 500 份来自与训练集所在组实践不同城市的组实践的自由文本记录。我们使用手动标记的自由文本记录来衡量修改后的工具的灵敏度/召回率、精度、特异性、准确性和 F 度量,以去除患者和医生的姓名、地点、地址、医疗记录、健康卡和电话号码。

结果

我们发现,修改后的培训计划的灵敏度为 88.3%,特异性为 91.4%,精度为 91.3%,准确性为 89.9%,F 度量为 0.90。验证集的灵敏度分别为 86.7%和 80.2%,特异性分别为 91.4%和 87.7%,精度分别为 91.1%和 87.4%,准确性分别为 89.0%和 83.8%,F 度量分别为 0.89 和 0.84。

结论

deid 程序可以进行修改,以合理准确地去除自由文本初级保健 EMR 记录中的可识别信息,同时保留临床内容。

相似文献

3
The MITRE Identification Scrubber Toolkit: design, training, and assessment.MITRE 识别清理工具包:设计、培训和评估。
Int J Med Inform. 2010 Dec;79(12):849-59. doi: 10.1016/j.ijmedinf.2010.09.007. Epub 2010 Oct 14.
4
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.

本文引用的文献

4
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
5
A de-identifier for medical discharge summaries.一份用于出院小结的去标识信息。
Artif Intell Med. 2008 Jan;42(1):13-35. doi: 10.1016/j.artmed.2007.10.001. Epub 2007 Nov 28.
7
Rapidly retargetable approaches to de-identification in medical records.医疗记录中快速可重新定位的去识别方法。
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):564-73. doi: 10.1197/jamia.M2435. Epub 2007 Jun 28.
8
Evaluating the state-of-the-art in automatic de-identification.评估自动去识别技术的最新进展。
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验