• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向健康研究中使用临床自由文本数据的数据治理标准的制定:立场文件。

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper.

作者信息

Jones Kerina H, Ford Elizabeth M, Lea Nathan, Griffiths Lucy J, Hassan Lamiece, Heys Sharon, Squires Emma, Nenadic Goran

机构信息

Population Data Science, Medical School, Swansea University, Swansea, United Kingdom.

Brighton and Sussex Medical School, Brighton, United Kingdom.

出版信息

J Med Internet Res. 2020 Jun 29;22(6):e16760. doi: 10.2196/16760.

DOI:10.2196/16760
PMID:32597785
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7367542/
Abstract

BACKGROUND

Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product.

OBJECTIVE

This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit.

METHODS

We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text.

RESULTS

We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders.

CONCLUSIONS

By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.

摘要

背景

临床自由文本数据(如门诊信件或护理记录)代表着一个庞大的、未被开发的丰富信息源。如果能更便于研究使用,这些数据将能够澄清并补充结构化数据字段中编码的信息。在数据可被重新用于研究之前,通常需要对其进行去标识化或匿名化处理,但目前缺乏既定的指导方针来管理有效的去标识化以及自由文本信息的使用,同时避免作为副产品损害数据效用。

目的

本研究旨在制定数据治理标准的建议,以便与现有的个人数据使用框架相结合,使自由文本数据能够安全地用于研究,造福患者和公众。

方法

我们概述了英国相关的数据保护法律法规以提供背景信息,并进行了快速文献综述和基于英国的案例研究,以探索处理自由文本数据时所使用的数据治理模式。我们还与包括文本挖掘研究人员和公众在内的利益相关者进行了交流,以探讨在处理临床自由文本时所感知到的障碍和解决方案。

结果

我们提出了一系列建议,包括需要对自由文本数据再利用的数据治理提供权威性指导,以确保数据流动和使用的公众透明度,将去标识化的自由文本数据视为潜在可识别的,并将使用限制在经认可的数据安全避风港,以及致力于持续改进的文化,以了解去标识化效果与重新识别风险之间的关系,并将其传达给所有利益相关者。

结论

通过综合各项活动的研究结果,我们提交了一份立场文件,以促进临床自由文本数据二次利用的数据治理标准的制定。在遵循现有数据治理框架的同时,需要进一步开展工作,通过承诺和投资来推进我们提出的建议,以确保并扩大临床自由文本数据为公共利益的安全再利用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f0/7367542/9996f658292f/jmir_v22i6e16760_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f0/7367542/9996f658292f/jmir_v22i6e16760_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f0/7367542/9996f658292f/jmir_v22i6e16760_fig1.jpg

相似文献

1
Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper.迈向健康研究中使用临床自由文本数据的数据治理标准的制定:立场文件。
J Med Internet Res. 2020 Jun 29;22(6):e16760. doi: 10.2196/16760.
2
Critical Care Network in the State of Qatar.卡塔尔国重症监护网络。
Qatar Med J. 2019 Nov 7;2019(2):2. doi: 10.5339/qmj.2019.qccc.2. eCollection 2019.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Culture of Care: Organizational Responsibilities关怀文化:组织职责
6
Toward a Risk-Utility Data Governance Framework for Research Using Genomic and Phenotypic Data in Safe Havens: Multifaceted Review.迈向安全港中使用基因组和表型数据进行研究的风险效用数据治理框架:多方面综述
J Med Internet Res. 2020 May 15;22(5):e16346. doi: 10.2196/16346.
7
Nonspecific deidentification of date-like text in deidentified clinical notes enables reidentification of dates.去识别化的临床记录中类似日期的非特定信息的去识别化处理可使日期被重新识别。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1967-1971. doi: 10.1093/jamia/ocac147.
8
Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.基于人工智能的基因组学和用于高通量筛选研究的自动显微镜图像分析中的数据管理与整理实践:推动可靠且符合伦理的人工智能应用。
Hum Genomics. 2025 Feb 23;19(1):16. doi: 10.1186/s40246-025-00716-x.
9
The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature.利用临床自由文本进行研究为英国患者带来益处的潜力:文献系统综述
Front Digit Health. 2021 Feb 10;3:606599. doi: 10.3389/fdgth.2021.606599. eCollection 2021.
10
Setting up a Governance Framework for Secondary Use of Routine Health Data in Nursing Homes: Development Study Using Qualitative Interviews.为养老院常规健康数据的二次利用建立治理框架:使用定性访谈的开发研究。
J Med Internet Res. 2023 Jan 25;25:e38929. doi: 10.2196/38929.

引用本文的文献

1
Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review.健康研究中数字非结构化数据充实的挑战与最佳实践:一项系统性叙述性综述
PLOS Digit Health. 2023 Oct 11;2(10):e0000347. doi: 10.1371/journal.pdig.0000347. eCollection 2023 Oct.
2
Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders.了解围绕创建经同意捐赠的临床自由文本数据库以开发和训练用于研究的自然语言处理模型的各方观点:与利益相关者进行焦点小组访谈。
JMIR Med Inform. 2023 May 3;11:e45534. doi: 10.2196/45534.
3

本文引用的文献

1
Should free-text data in electronic medical records be shared for research? A citizens' jury study in the UK.电子病历中的自由文本数据是否应共享用于研究?英国的一个公民陪审团研究。
J Med Ethics. 2020 Jun;46(6):367-377. doi: 10.1136/medethics-2019-105472. Epub 2020 May 26.
2
Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system.利用自然语言处理从非结构化临床信件中提取结构化癫痫数据:ExECT(癫痫临床文本提取)系统的开发和验证。
BMJ Open. 2019 Apr 1;9(4):e023232. doi: 10.1136/bmjopen-2018-023232.
3
Patients' and Members of the Public's Wishes Regarding Transparency in the Context of Secondary Use of Health Data: Scoping Review.
患者和公众对健康数据二次使用背景下透明度的期望:范围综述。
J Med Internet Res. 2023 Apr 13;25:e45002. doi: 10.2196/45002.
4
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。
NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.
5
The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature.利用临床自由文本进行研究为英国患者带来益处的潜力:文献系统综述
Front Digit Health. 2021 Feb 10;3:606599. doi: 10.3389/fdgth.2021.606599. eCollection 2021.
6
Co-development of a Best Practice Checklist for Mental Health Data Science: A Delphi Study.心理健康数据科学最佳实践清单的共同开发:一项德尔菲研究。
Front Psychiatry. 2021 Jun 10;12:643914. doi: 10.3389/fpsyt.2021.643914. eCollection 2021.
7
Automated detection of patients with dementia whose symptoms have been identified in primary care but have no formal diagnosis: a retrospective case-control study using electronic primary care records.对在初级保健中已发现症状但未得到正式诊断的痴呆患者进行自动检测:一项使用电子初级保健记录的回顾性病例对照研究。
BMJ Open. 2021 Jan 22;11(1):e039248. doi: 10.1136/bmjopen-2020-039248.
Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.
使用临床自然语言处理进行健康结果研究:未来进展的概述和可行建议。
J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.
4
Depression and cause-specific mortality in an ethnically diverse cohort from the UK: 8-year prospective study.在英国一个种族多样化的队列中进行的为期 8 年的前瞻性研究:抑郁与特定病因死亡率。
Psychol Med. 2019 Jul;49(10):1639-1651. doi: 10.1017/S0033291718002210. Epub 2018 Sep 5.
5
CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital.CogStack-在大型国民保健制度基金会信托医院中部署集成信息检索和提取服务的经验。
BMC Med Inform Decis Mak. 2018 Jun 25;18(1):47. doi: 10.1186/s12911-018-0623-9.
6
Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing.使用自然语言处理技术在精神科临床研究数据库中识别自杀意念和自杀企图。
Sci Rep. 2018 May 9;8(1):7426. doi: 10.1038/s41598-018-25773-2.
7
Safety of non-insulin glucose-lowering drugs in pregnant women with pre-gestational diabetes: A cohort study.孕前糖尿病孕妇使用非胰岛素降糖药物的安全性:一项队列研究。
Diabetes Obes Metab. 2018 Jul;20(7):1642-1651. doi: 10.1111/dom.13275. Epub 2018 Mar 26.
8
Negative Symptoms in Early-Onset Psychosis and Their Association With Antipsychotic Treatment Failure.早期精神病的阴性症状及其与抗精神病药物治疗失败的关系。
Schizophr Bull. 2019 Jan 1;45(1):69-79. doi: 10.1093/schbul/sbx197.
9
SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research.SemEHR:一个通用的语义搜索系统,用于从临床记录中提取语义数据,以提供个性化护理、临床试验招募和临床研究。
J Am Med Inform Assoc. 2018 May 1;25(5):530-537. doi: 10.1093/jamia/ocx160.
10
ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records.ADEPt,一种用于从自由文本电子健康记录中提取药物不良事件的语义丰富管道。
PLoS One. 2017 Nov 9;12(11):e0187121. doi: 10.1371/journal.pone.0187121. eCollection 2017.