Imakura Akira, Tsunoda Ryoya, Kagawa Rina, Yamagata Kunihiro, Sakurai Tetsuya
Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan.
University of Tsukuba Hospital, University of Tsukuba, 2-1-1 Amakubo, Tsukuba, Ibaraki 305-8576, Japan.
J Biomed Inform. 2023 Jan;137:104264. doi: 10.1016/j.jbi.2022.104264. Epub 2022 Nov 30.
The demand for the privacy-preserving survival analysis of medical data integrated from multiple institutions or countries has been increased. However, sharing the original medical data is difficult because of privacy concerns, and even if it could be achieved, we have to pay huge costs for cross-institutional or cross-border communications. To tackle these difficulties of privacy-preserving survival analysis on multiple parties, this study proposes a novel data collaboration Cox proportional hazards (DC-COX) model based on a data collaboration framework for horizontally and vertically partitioned data. By integrating dimensionality-reduced intermediate representations instead of the original data, DC-COX obtains a privacy-preserving survival analysis without iterative cross-institutional communications or huge computational costs. DC-COX enables each local party to obtain an approximation of the maximum likelihood model parameter, the corresponding statistic, such as the p-value, and survival curves for subgroups. Based on a bootstrap technique, we introduce a dimensionality reduction method to improve the efficiency of DC-COX. Numerical experiments demonstrate that DC-COX can compute a model parameter and the corresponding statistics with higher performance than the local party analysis. Particularly, DC-COX demonstrates outstanding performance in essential feature selection based on the p-value compared with the existing methods including the federated learning-based method.
对来自多个机构或国家的整合医学数据进行隐私保护生存分析的需求日益增加。然而,由于隐私问题,共享原始医学数据很困难,而且即使能够实现,我们也必须为跨机构或跨境通信支付巨额费用。为了解决多方隐私保护生存分析的这些困难,本研究基于水平和垂直分区数据的数据协作框架,提出了一种新颖的数据协作Cox比例风险(DC-COX)模型。通过整合降维后的中间表示而不是原始数据,DC-COX无需进行迭代跨机构通信或巨大的计算成本即可获得隐私保护生存分析。DC-COX使每个本地方能够获得最大似然模型参数的近似值、相应的统计量(如p值)以及子组的生存曲线。基于自助法技术,我们引入了一种降维方法来提高DC-COX的效率。数值实验表明,DC-COX能够以比本地方分析更高的性能计算模型参数和相应的统计量。特别是,与包括基于联邦学习的方法在内的现有方法相比,DC-COX在基于p值的基本特征选择方面表现出卓越的性能。