Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
J Biomed Inform. 2022 Oct;134:104176. doi: 10.1016/j.jbi.2022.104176. Epub 2022 Aug 23.
For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information.
For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning.
Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations.
The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.
对于具有时间事件结局和高维特征的多中心异质真实世界数据(RWD),我们提出了 SurvMaximin 算法,通过从一组医疗中心获取汇总信息,在不共享患者级信息的情况下,为目标人群估计 Cox 模型特征系数。
对于我们希望从中获取信息以提高目标人群预测性能的每个中心,我们拟合了一个惩罚 Cox 模型来估计中心的特征系数。使用估计的特征系数和目标人群的协方差矩阵,我们然后为目标人群获得 SurvMaximin 估计的特征系数集。目标人群可以是由所有中心组成的整个队列,对应于联邦学习,也可以是单个中心,对应于迁移学习。
模拟研究和真实的国际电子健康记录应用研究表明,与仅使用目标站点信息和其他现有方法的估计器相比,所提出的 SurvMaximin 算法在准确性方面具有可比性或更高的准确性。SurvMaximin 估计器对中心之间的样本量和估计特征系数的变化具有鲁棒性,这对于观察次数较少的目标站点来说意味着显著改善了估计。
SurvMaximin 方法非常适合高维生存分析环境中的联邦学习和迁移学习。SurvMaximin 仅需要参与中心进行一次性的汇总信息交换。估计的回归向量可能非常异构。SurvMaximin 提供了稳健的 Cox 特征系数估计,而无需目标人群中的结局信息,并且具有隐私保护。