Kafatos George, Levy Julia, Jose Sophie, Hindocha Pooja, Archangelidi Olia, Vernon Sally, Frayling Lora
Center for Observational Research, Amgen Ltd, Uxbridge, UK.
IQVIA Ltd, London, UK.
Ther Innov Regul Sci. 2025 Jun 5. doi: 10.1007/s43441-025-00820-z.
Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.
真实世界数据(RWD)对于推进药物研发和医疗保健服务的重要性日益得到认可,监管机构也越来越认识到其价值。然而,严格的治理要求虽然对于保护患者隐私至关重要,但却给开展研究带来了重大挑战。由英国国家医疗服务体系(NHS)管理的癌症分析系统(CAS)包括一个全国癌症登记处和相关的医疗保健数据集。为应对数据访问挑战,可使用一组从CAS生成的公开可用的合成数据集“模拟物”(Simulacrum)来进行初步数据分析、假设生成以及开发可执行代码,以便对CAS数据进行分析。本文提出了一种协作运营模式,利用“模拟物”实现高效、符合隐私规定的分析。对使用该模型开展的18个项目的分析表明,从代码开发开始到数据发布(CDDR)的平均时长为2.3个月。通过使研究人员能够对合成数据进行符合隐私规定的分析,这种方法通过提供对患者层面数据的洞察提高了透明度,同时减少了对敏感数据保管者的依赖。我们的研究结果凸显了如何利用合成数据来促进对受限患者层面真实世界数据的高效研究,同时保护患者隐私。该框架为其他数据保管者提供了一个可扩展的解决方案,能够实现对真实世界数据的更广泛使用,加速医疗创新。