Wang Taowei D, Henderson Darren W, Weber Griffin M, Morris Michele, Sadhu Eugene M, Murphy Shawn N, Visweswaran Shyam, Klann Jeff G
Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA.
Center for Clinical and Translational Science, University of Kentucky, Lexington, KY 40536, USA.
medRxiv. 2025 Jan 17:2025.01.17.25320686. doi: 10.1101/2025.01.17.25320686.
Federated research networks, like Evolve to Next-Gen Accrual of patients to Clinical Trials (ENACT), aim to facilitate medical research by exchanging electronic health record (EHR) data. However, poor data quality can hinder this goal. While networks typically set guidelines and standards to address this problem, we developed an organically evolving, data-centric method using patient counts to identify data quality issues, applicable even to sites not yet in the network.
We distribute high-performance patient counting scripts as part of Integrating Biology at the Bedside (i2b2), which all ENACT sites operate. They produce counts of patients associated with ENACT ontology terms for each site. At the ENACT Hub, our pipeline aggregates site-contributed counts to produce network statistics, which our self-service web application, Data Quality Explorer (DQE), ingests to help sites conduct data quality investigation relative to the network.
Thirteen ENACT sites have contributed their patient counts, and currently seven sites have signed up to use DQE to analyze data quality issues. We announced a call to all ENACT sites to contribute additional patient counts.
Identifying site data quality problems relative to the network is novel. Using a metric based on evolving network statistics complements rigid data quality checks. It is adaptable to any network and has low barriers of entry, with patient counting being the sole requirement.
We implemented a metric for conducting data quality investigation in ENACT using patient counting and network statistics. Our end-to-end pipeline is privacy-preserving and the underlying design is generalizable.
联合研究网络,如“向临床试验患者招募的下一代演进”(ENACT),旨在通过交换电子健康记录(EHR)数据来促进医学研究。然而,数据质量差会阻碍这一目标。虽然网络通常会制定指导方针和标准来解决这个问题,但我们开发了一种以数据为中心、有机演进的方法,利用患者计数来识别数据质量问题,甚至适用于尚未加入该网络的站点。
我们将高性能患者计数脚本作为床边整合生物学(i2b2)的一部分进行分发,所有ENACT站点都运行该软件。它们会生成每个站点与ENACT本体术语相关的患者计数。在ENACT中心,我们的管道汇总各站点提供的计数以生成网络统计数据,我们的自助式网络应用程序“数据质量浏览器”(DQE)会接收这些数据,以帮助各站点针对网络进行数据质量调查。
13个ENACT站点提供了它们的患者计数,目前有7个站点已注册使用DQE来分析数据质量问题。我们呼吁所有ENACT站点提供更多的患者计数。
相对于网络识别站点数据质量问题是新颖的。使用基于不断演进的网络统计数据的指标补充了严格的数据质量检查。它适用于任何网络,且进入门槛低,唯一的要求就是患者计数。
我们在ENACT中实施了一种使用患者计数和网络统计数据进行数据质量调查的指标。我们的端到端管道保护隐私,其基础设计具有通用性。