Kim Yejin, Sun Jimeng, Yu Hwanjo, Jiang Xiaoqian
Pohang University of Science and Technology, Pohang, Korea.
University of California, San Diego, La Jolla, CA.
KDD. 2017 Aug;2017:887-895. doi: 10.1145/3097983.3098118.
Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.
张量分解模型提供了一种有效的方法,可将海量电子健康记录转换为有意义的临床概念(表型),用于数据分析。这些模型需要大量多样的样本,以避免群体偏差。一个公开的挑战是如何在多个医院之间联合推导表型,而在这些医院中,直接的患者级数据共享是不可能的(例如,由于机构政策)。在本文中,我们开发了一种新颖的解决方案,以实现用于计算表型分析的联邦张量分解,而无需共享患者级数据。我们基于交替方向乘子法(ADMM)开发了安全的数据协调和联邦计算程序。使用这种方法,多家医院迭代更新张量,并将安全的汇总信息传输到中央服务器,服务器汇总这些信息以生成表型。我们通过真实医疗数据集证明,我们的方法在尊重隐私的同时,在准确性和表型发现方面类似于集中式训练模型(基于组合数据集)。