Wang Ran, Xu Cheng, Zhang Shuhao, Ye Fangwen, Tang Yusen, Tang Sisui, Zhang Hangning, Du Wendi, Zhang Xiaotong
School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083, Beijing, China.
Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, 100083, Beijing, China.
Nat Commun. 2024 Oct 28;15(1):9290. doi: 10.1038/s41467-024-53431-x.
The rapid advancement of Industry 4.0 necessitates close collaboration among material research institutions to accelerate the development of novel materials. However, multi-institutional cooperation faces significant challenges in protecting sensitive data, leading to data silos. Additionally, the heterogeneous and non-independent and identically distributed (non-i.i.d.) nature of material data hinders model accuracy and generalization in collaborative computing. In this paper, we introduce the MatSwarm framework, built on swarm learning, which integrates federated learning with blockchain technology. MatSwarm features two key innovations: a swarm transfer learning method with a regularization term to enhance the alignment of local model parameters, and the use of Trusted Execution Environments (TEE) with Intel SGX for heightened security. These advancements significantly enhance accuracy, generalization, and ensure data confidentiality throughout the model training and aggregation processes. Implemented within the National Material Data Management and Services (NMDMS) platform, MatSwarm has successfully aggregated over 14 million material data entries from more than thirty research institutions across China. The framework has demonstrated superior accuracy and generalization compared to models trained independently by individual institutions.
工业4.0的快速发展使得材料研究机构之间需要密切合作,以加速新型材料的开发。然而,多机构合作在保护敏感数据方面面临重大挑战,导致数据孤岛的出现。此外,材料数据的异构性以及非独立同分布(non-i.i.d.)特性阻碍了协作计算中模型的准确性和泛化能力。在本文中,我们介绍了基于群体学习构建的MatSwarm框架,该框架将联邦学习与区块链技术相结合。MatSwarm具有两项关键创新:一种带有正则化项的群体迁移学习方法,用于增强局部模型参数的对齐;以及使用英特尔SGX的可信执行环境(TEE)来提高安全性。这些进展显著提高了准确性和泛化能力,并在整个模型训练和聚合过程中确保了数据保密性。MatSwarm在国家材料数据管理与服务(NMDMS)平台内实施,已成功聚合了来自中国三十多个研究机构的超过1400万条材料数据条目。与各机构独立训练的模型相比,该框架展现出了卓越的准确性和泛化能力。