Wu Xin, Stratford Jeran, Kesler Karen, Ives Cataia, Hendershot Tabitha, Kroner Barbara, Qin Ying, Pan Huaqin
Analytics Practice Area, RTI International, Research Triangle Park, United States of America.
bioRxiv. 2024 Aug 19:2024.08.15.608203. doi: 10.1101/2024.08.15.608203.
Sickle cell disease (SCD) is a rare group of inherited red blood cell disorders that affect hemoglobin, resulting in serious multi-system complications. The limited number of patients available to participate in research studies can inhibit investigating sophisticated relationships. Secondary analysis is a research method that involves using existing data to answer new research questions. Data harmonization enables secondary analysis by combining data across studies, especially helpful for rare disease research where individual studies may be small. The National Heart, Lung, and Blood Institute Cure Sickle Cell Initiative (CureSCi) Metadata Catalog is a web-based tool to identify SCD study datasets for conducting data harmonization and secondary analysis. We present a proof-of-concept secondary analysis to explore factors associated with discontinuation of hydroxyurea, a safe and effective first line SCD therapy, to illustrate the utility of the CureSCi Metadata Catalog to expedite and enable more robust SCD research.
We performed secondary analysis of SCD studies using a multi-step workflow: develop research questions, identify study datasets, identify variables of interest, harmonize variables, and establish an analysis method. A harmonized dataset consisting of eight predictor variables across five studies was created. Secondary analysis involved a generalized linear model was employed to identify factors that significantly impact hydroxyurea discontinuation.
The CureSCi Metadata Catalog provided a platform to efficiently find relevant studies and design a harmonization strategy to prepare data for secondary analysis. Multivariate analysis of the harmonized identified that patients who are older, are female, had a history of blood transfusion therapy, had episodes of acute chest syndrome, and had the SC sickle cell genotype are more likely to stop hydroxyurea treatment.
This secondary analysis provides a template for how the CureSCi Metadata Catalog expedites dataset discovery of sickle cell studies for identifying relationships between variables or validating existing findings.
镰状细胞病(SCD)是一组罕见的遗传性红细胞疾病,会影响血红蛋白,导致严重的多系统并发症。参与研究的患者数量有限可能会阻碍对复杂关系的研究。二次分析是一种研究方法,涉及使用现有数据回答新的研究问题。数据协调通过合并跨研究的数据来实现二次分析,这对于个别研究可能规模较小的罕见病研究特别有帮助。美国国立心肺血液研究所镰状细胞病治疗计划(CureSCi)元数据目录是一个基于网络的工具,用于识别SCD研究数据集,以进行数据协调和二次分析。我们进行了一项概念验证性二次分析,以探索与停用羟基脲相关的因素,羟基脲是一种安全有效的SCD一线治疗药物,以说明CureSCi元数据目录在加快并实现更有力的SCD研究方面的效用。
我们使用多步骤工作流程对SCD研究进行二次分析:提出研究问题、识别研究数据集、识别感兴趣的变量、协调变量并建立分析方法。创建了一个由五项研究中的八个预测变量组成的协调数据集。二次分析采用广义线性模型来识别对羟基脲停用有显著影响的因素。
CureSCi元数据目录提供了一个平台,可有效地找到相关研究并设计协调策略,为二次分析准备数据。对协调后的数据进行多变量分析发现,年龄较大、为女性、有输血治疗史、有急性胸综合征发作史以及具有SC镰状细胞基因型的患者更有可能停止羟基脲治疗。
这项二次分析为CureSCi元数据目录如何加快镰状细胞研究的数据集发现以识别变量之间的关系或验证现有发现提供了一个模板。