Stephen John J, Carolan Padraig, Krefman Amy E, Sedaghat Sanaz, Mansolf Maxwell, Allen Norrina B, Scholtens Denise M
Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
Division of Epidemiology and Community Health, University of Minnesota School of Public Health, Minneapolis, MN 55455, USA.
Patterns (N Y). 2024 Jun 14;5(8):101003. doi: 10.1016/j.patter.2024.101003. eCollection 2024 Aug 9.
Combining pertinent data from multiple studies can increase the robustness of epidemiological investigations. Effective "pre-statistical" data harmonization is paramount to the streamlined conduct of collective, multi-study analysis. Harmonizing data and documenting decisions about the transformations of variables to a common set of categorical values and measurement scales are time consuming and can be error prone, particularly for numerous studies with large quantities of variables. The R package facilitates harmonization by combining multiple datasets, applying data transformation functions, and creating long and wide harmonized datasets. The user provides transformation instructions in a "harmonization sheet" that includes dataset names, variable names, and coding instructions and centrally tracks all decisions. The package performs harmonization, generates error logs as necessary, and creates summary reports of harmonized data. is poised to serve as a central feature of data preparation for the joint analysis of multiple studies.
整合来自多项研究的相关数据可以提高流行病学调查的稳健性。有效的“统计前”数据协调对于简化集体多研究分析的开展至关重要。将数据协调并记录关于将变量转换为一组共同分类值和测量尺度的决策既耗时又容易出错,尤其是对于有大量变量的众多研究而言。R包通过合并多个数据集、应用数据转换函数以及创建长格式和宽格式的协调数据集来促进协调。用户在“协调表”中提供转换说明,该表包括数据集名称、变量名称和编码说明,并集中跟踪所有决策。该包执行协调,必要时生成错误日志,并创建协调数据的总结报告。它有望成为多项研究联合分析数据准备的核心功能。