利用自然语言处理对具有网络风险的临床记录进行分类

Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

作者信息

Schmeelk Suzanna, Dogo Martins Samuel, Peng Yifan, Patra Braja Gopal

机构信息

St. John's University, Queens, New York.

Queen's University Belfast, United Kingdom.

出版信息

Proc Annu Hawaii Int Conf Syst Sci. 2022;2022:4140-4146. doi: 10.24251/hicss.2022.505. Epub 2022 Jan 4.

DOI:10.24251/hicss.2022.505

PMID:35528964

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9076271/

Abstract

Clinical notes, which can be embedded into electronic medical records, document patient care delivery and summarize interactions between healthcare providers and patients. These clinical notes directly inform patient care and can also indirectly inform research and quality/safety metrics, among other indirect metrics. Recently, some states within the United States of America require patients to have open access to their clinical notes to improve the exchange of patient information for patient care. Thus, developing methods to assess the cyber risks of clinical notes before sharing and exchanging data is critical. While existing natural language processing techniques are geared to de-identify clinical notes, to the best of our knowledge, few have focused on classifying sensitive-information risk, which is a fundamental step toward developing effective, widespread protection of patient health information. To bridge this gap, this research investigates methods for identifying security/privacy risks within clinical notes. The classification either can be used upstream to identify areas within notes that likely contain sensitive information or downstream to improve the identification of clinical notes that have not been entirely de-identified. We develop several models using unigram and word2vec features with different classifiers to categorize sentence risk. Experiments on i2b2 de-identification dataset show that the SVM classifier using word2vec features obtained a maximum F1-score of 0.792. Future research involves articulation and differentiation of risk in terms of different global regulatory requirements.

摘要

临床记录可嵌入电子病历中，记录患者护理情况，并总结医疗服务提供者与患者之间的互动。这些临床记录直接为患者护理提供信息，也可间接为研究以及质量/安全指标等其他间接指标提供信息。最近，美国的一些州要求患者能够公开获取自己的临床记录，以改善患者护理中患者信息的交换。因此，在共享和交换数据之前开发评估临床记录网络风险的方法至关重要。虽然现有的自然语言处理技术旨在对临床记录进行去识别处理，但据我们所知，很少有技术专注于对敏感信息风险进行分类，而这是朝着有效、广泛地保护患者健康信息迈出的关键一步。为了弥补这一差距，本研究调查了识别临床记录中安全/隐私风险的方法。这种分类既可以在流程上游用于识别记录中可能包含敏感信息的区域，也可以在流程下游用于改进对尚未完全去识别的临床记录的识别。我们使用一元语法和词向量特征以及不同的分类器开发了几种模型，对句子风险进行分类。在i2b2去识别数据集上进行的实验表明，使用词向量特征的支持向量机分类器获得的最大F1分数为0.792。未来的研究包括根据不同的全球监管要求阐明和区分风险。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用自然语言处理对具有网络风险的临床记录进行分类

Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

利用自然语言处理对具有网络风险的临床记录进行分类

Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

作者信息

机构信息

出版信息