Ventz Steffen, Mazumder Rahul, Trippa Lorenzo
Department of Data Science, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
Biometrics. 2022 Dec;78(4):1365-1376. doi: 10.1111/biom.13517. Epub 2021 Sep 16.
We introduce a statistical procedure that integrates datasets from multiple biomedical studies to predict patients' survival, based on individual clinical and genomic profiles. The proposed procedure accounts for potential differences in the relation between predictors and outcomes across studies, due to distinct patient populations, treatments and technologies to measure outcomes and biomarkers. These differences are modeled explicitly with study-specific parameters. We use hierarchical regularization to shrink the study-specific parameters towards each other and to borrow information across studies. The estimation of the study-specific parameters utilizes a similarity matrix, which summarizes differences and similarities of the relations between covariates and outcomes across studies. We illustrate the method in a simulation study and using a collection of gene expression datasets in ovarian cancer. We show that the proposed model increases the accuracy of survival predictions compared to alternative meta-analytic methods.
我们介绍了一种统计程序,该程序整合来自多个生物医学研究的数据集,以便根据个体临床和基因组概况预测患者的生存情况。由于患者群体、治疗方法以及测量结果和生物标志物的技术不同,所提出的程序考虑了不同研究中预测因素与结果之间关系的潜在差异。这些差异通过特定于研究的参数进行明确建模。我们使用分层正则化来使特定于研究的参数相互收缩,并在不同研究之间借用信息。特定于研究的参数估计利用一个相似性矩阵,该矩阵总结了不同研究中协变量与结果之间关系的差异和相似性。我们在模拟研究中以及使用卵巢癌基因表达数据集的集合来说明该方法。我们表明,与替代的荟萃分析方法相比,所提出的模型提高了生存预测的准确性。