Covance, the Drug Development Division of LabCorp Carnegie Center, Princeton, NJ, USA.
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz032.
Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of often inconsistent data in the presence of ever-changing requirements poses a significant technical challenge. As part of a comprehensive clinical data repository, we have developed a data warehouse that integrates patient data from any source, standardizes it and makes it accessible to study teams in a timely manner to support a wide range of analytic tasks for both in-flight and completed studies. Our solution combines Apache HBase, a NoSQL column store, Apache Phoenix, a massively parallel relational query engine and a user-friendly interface to facilitate efficient loading of large volumes of data under incomplete or ambiguous specifications, utilizing an extract-load-transform design pattern that defers data mapping until query time. This approach allows us to maintain a single copy of the data and transform it dynamically into any desirable format without requiring additional storage. Changes to the mapping specifications can be easily introduced and multiple representations of the data can be made available concurrently. Further, by versioning the data and the transformations separately, we can apply historical maps to current data or current maps to historical data, which simplifies the maintenance of data cuts and facilitates interim analyses for adaptive trials. The result is a highly scalable, secure and redundant solution that combines the flexibility of a NoSQL store with the robustness of a relational query engine to support a broad range of applications, including clinical data management, medical review, risk-based monitoring, safety signal detection, post hoc analysis of completed studies and many others.
临床试验数据通常通过不同供应商使用不同技术和数据标准开发的多个系统收集。这些数据需要进行集成、标准化和转换,以满足各种监测和报告目的。在不断变化的需求下,需要处理大量通常不一致的数据,这带来了重大的技术挑战。作为综合临床数据存储库的一部分,我们开发了一个数据仓库,该仓库可以整合来自任何来源的患者数据,对其进行标准化,并及时提供给研究团队,以支持针对进行中和已完成研究的各种分析任务。我们的解决方案结合了 Apache HBase(一种 NoSQL 列式存储)、Apache Phoenix(一种大规模并行关系查询引擎)和用户友好的界面,以促进在不完整或模糊规范下高效加载大量数据,利用提取-加载-转换设计模式,直到查询时间才推迟数据映射。这种方法使我们能够维护数据的单一副本,并根据需要将其动态转换为任何所需格式,而无需额外的存储。可以轻松引入映射规范的更改,并同时提供数据的多个表示形式。此外,通过分别对数据和转换进行版本控制,我们可以将历史映射应用于当前数据或将当前映射应用于历史数据,从而简化数据切割的维护,并为适应性试验提供临时分析。结果是一个高度可扩展、安全且冗余的解决方案,它结合了 NoSQL 存储的灵活性和关系查询引擎的稳健性,以支持广泛的应用,包括临床数据管理、医学审查、基于风险的监测、安全信号检测、已完成研究的事后分析等。