AI Lab, Lenovo, Beijing, China.
Comb Chem High Throughput Screen. 2021;24(7):933-946. doi: 10.2174/1386207323666201022110616.
As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention.
The study aims to strengthen the protection of private data while ensuring the model training process; this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data.
Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained.
The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance.
Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.
随着人工智能和大数据分析的快速发展,数据隐私,特别是患者医疗数据隐私,越来越受到关注。
本研究旨在加强对私有数据的保护,同时确保模型训练过程;本文介绍了一种基于多区块链的去中心化协同机器学习医学图像分析训练方法。通过这种方式,来自不同医疗机构的研究人员可以在不交换敏感患者数据的情况下进行协作训练模型。
应用部分参数更新方法,以防止模型传播过程中的间接隐私泄露。在多区块链系统中的点对点通信中,机器学习任务可以利用另一个区块链中另一个类似任务的辅助信息。此外,在协同训练过程之后,将为不同医疗机构训练个性化模型。
实验结果表明,我们的方法通过收集所有参与者的数据集实现了与集中式模型训练方法类似的性能,同时防止了私有数据泄露。从另一个区块链上的类似任务转移辅助信息也被证明可以有效地加速模型收敛并提高模型准确性,特别是在数据缺失的情况下。个性化训练过程进一步提高了模型性能。
我们的方法可以有效地帮助来自不同组织的研究人员在不披露其私人数据的情况下实现协作训练。