He Zhe, Ryan Patrick, Hoxha Julia, Wang Shuang, Carini Simona, Sim Ida, Weng Chunhua
Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Janssen Research and Development, Titusville, NJ 08560, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA.
J Biomed Inform. 2016 Apr;60:66-76. doi: 10.1016/j.jbi.2016.01.007. Epub 2016 Jan 25.
To develop a multivariate method for quantifying the population representativeness across related clinical studies and a computational method for identifying and characterizing underrepresented subgroups in clinical studies.
We extended a published metric named Generalizability Index for Study Traits (GIST) to include multiple study traits for quantifying the population representativeness of a set of related studies by assuming the independence and equal importance among all study traits. On this basis, we compared the effectiveness of GIST and multivariate GIST (mGIST) qualitatively. We further developed an algorithm called "Multivariate Underrepresented Subgroup Identification" (MAGIC) for constructing optimal combinations of distinct value intervals of multiple traits to define underrepresented subgroups in a set of related studies. Using Type 2 diabetes mellitus (T2DM) as an example, we identified and extracted frequently used quantitative eligibility criteria variables in a set of clinical studies. We profiled the T2DM target population using the National Health and Nutrition Examination Survey (NHANES) data.
According to the mGIST scores for four example variables, i.e., age, HbA1c, BMI, and gender, the included observational T2DM studies had superior population representativeness than the interventional T2DM studies. For the interventional T2DM studies, Phase I trials had better population representativeness than Phase III trials. People at least 65years old with HbA1c value between 5.7% and 7.2% were particularly underrepresented in the included T2DM trials. These results confirmed well-known knowledge and demonstrated the effectiveness of our methods in population representativeness assessment.
mGIST is effective at quantifying population representativeness of related clinical studies using multiple numeric study traits. MAGIC identifies underrepresented subgroups in clinical studies. Both data-driven methods can be used to improve the transparency of design bias in participation selection at the research community level.
开发一种多变量方法以量化相关临床研究中的总体代表性,并开发一种计算方法以识别和描述临床研究中代表性不足的亚组。
我们扩展了一种已发表的名为“研究特征可推广性指数”(GIST)的指标,通过假设所有研究特征之间相互独立且重要性相等,纳入多个研究特征以量化一组相关研究的总体代表性。在此基础上,我们定性地比较了GIST和多变量GIST(mGIST)的有效性。我们进一步开发了一种名为“多变量代表性不足亚组识别”(MAGIC)的算法,用于构建多个特征不同值区间的最优组合,以定义一组相关研究中代表性不足的亚组。以2型糖尿病(T2DM)为例,我们在一组临床研究中识别并提取了常用的定量纳入标准变量。我们使用美国国家健康与营养检查调查(NHANES)数据对T2DM目标人群进行了描述。
根据年龄、糖化血红蛋白(HbA1c)、体重指数(BMI)和性别这四个示例变量的mGIST评分,纳入的观察性T2DM研究的总体代表性优于干预性T2DM研究。对于干预性T2DM研究,I期试验的总体代表性优于III期试验。在纳入的T2DM试验中,年龄至少65岁且HbA1c值在5.7%至7.2%之间的人群代表性尤其不足。这些结果证实了已知知识,并证明了我们的方法在总体代表性评估中的有效性。
mGIST能够有效地使用多个数值研究特征来量化相关临床研究的总体代表性。MAGIC可识别临床研究中代表性不足的亚组。这两种数据驱动的方法均可用于提高研究社区层面参与选择中设计偏倚的透明度。