Jones Kerina H, Ford Elizabeth M, Lea Nathan, Griffiths Lucy J, Hassan Lamiece, Heys Sharon, Squires Emma, Nenadic Goran
Population Data Science, Medical School, Swansea University, Swansea, United Kingdom.
Brighton and Sussex Medical School, Brighton, United Kingdom.
J Med Internet Res. 2020 Jun 29;22(6):e16760. doi: 10.2196/16760.
Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product.
This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit.
We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text.
We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders.
By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.
临床自由文本数据(如门诊信件或护理记录)代表着一个庞大的、未被开发的丰富信息源。如果能更便于研究使用,这些数据将能够澄清并补充结构化数据字段中编码的信息。在数据可被重新用于研究之前,通常需要对其进行去标识化或匿名化处理,但目前缺乏既定的指导方针来管理有效的去标识化以及自由文本信息的使用,同时避免作为副产品损害数据效用。
本研究旨在制定数据治理标准的建议,以便与现有的个人数据使用框架相结合,使自由文本数据能够安全地用于研究,造福患者和公众。
我们概述了英国相关的数据保护法律法规以提供背景信息,并进行了快速文献综述和基于英国的案例研究,以探索处理自由文本数据时所使用的数据治理模式。我们还与包括文本挖掘研究人员和公众在内的利益相关者进行了交流,以探讨在处理临床自由文本时所感知到的障碍和解决方案。
我们提出了一系列建议,包括需要对自由文本数据再利用的数据治理提供权威性指导,以确保数据流动和使用的公众透明度,将去标识化的自由文本数据视为潜在可识别的,并将使用限制在经认可的数据安全避风港,以及致力于持续改进的文化,以了解去标识化效果与重新识别风险之间的关系,并将其传达给所有利益相关者。
通过综合各项活动的研究结果,我们提交了一份立场文件,以促进临床自由文本数据二次利用的数据治理标准的制定。在遵循现有数据治理框架的同时,需要进一步开展工作,通过承诺和投资来推进我们提出的建议,以确保并扩大临床自由文本数据为公共利益的安全再利用。