Suppr超能文献

对电子健康记录数据库进行去识别处理——医疗记录的匿名性、准确性和可读性。

De-identifying an EHR database - anonymity, correctness and readability of the medical record.

作者信息

Pantazos Kostas, Lauesen Soren, Lippert Soren

机构信息

Software Development Group, IT-University of Copenhagen, Denmark.

出版信息

Stud Health Technol Inform. 2011;169:862-6.

Abstract

Electronic health records (EHR) contain a large amount of structured data and free text. Exploring and sharing clinical data can improve healthcare and facilitate the development of medical software. However, revealing confidential information is against ethical principles and laws. We de-identified a Danish EHR database with 437,164 patients. The goal was to generate a version with real medical records, but related to artificial persons. We developed a de-identification algorithm that uses lists of named entities, simple language analysis, and special rules. Our algorithm consists of 3 steps: collect lists of identifiers from the database and external resources, define a replacement for each identifier, and replace identifiers in structured data and free text. Some patient records could not be safely de-identified, so the de-identified database has 323,122 patient records with an acceptable degree of anonymity, readability and correctness (F-measure of 95%). The algorithm has to be adjusted for each culture, language and database.

摘要

电子健康记录(EHR)包含大量结构化数据和自由文本。探索和共享临床数据可以改善医疗保健并促进医疗软件的开发。然而,泄露机密信息违反伦理原则和法律。我们对一个拥有437,164名患者的丹麦EHR数据库进行了去识别处理。目标是生成一个包含真实医疗记录但与虚构人物相关的版本。我们开发了一种去识别算法,该算法使用命名实体列表、简单语言分析和特殊规则。我们的算法包括3个步骤:从数据库和外部资源收集标识符列表,为每个标识符定义替换项,以及替换结构化数据和自由文本中的标识符。一些患者记录无法安全地进行去识别处理,因此去识别后的数据库有323,122条患者记录,具有可接受的匿名程度、可读性和正确性(F值为95%)。该算法必须针对每种文化、语言和数据库进行调整。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验