Zhang Enpei, Chai Jingyi, Ye Rui, Wang Yanfeng, Chen Siheng
Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China.
School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China.
Nat Commun. 2025 Aug 25;16(1):7923. doi: 10.1038/s41467-025-62959-5.
Data plays a crucial role in training contemporary AI models, but much of the available public data will be exhausted in a few years, directing the world's attention toward the massive decentralized private data. However, the privacy-sensitive nature of raw data and lack of incentive mechanism prevent these valuable data from being fully exploited. Here we propose inclusive and incentivized personalized federated learning (iPFL), which incentivizes data holders with diverse purposes to collaboratively train personalized models without revealing raw data. iPFL constructs a model-sharing market by solving a graph-based training optimization and incorporates an incentive mechanism based on game theory principles. Theoretical analysis shows that iPFL adheres to two key incentive properties: individual rationality and Incentive compatibility. Empirical studies on eleven AI tasks (e.g., large language models' instruction-following tasks) demonstrate that iPFL consistently achieves the highest economic utility, and better or comparable model performance compared to baseline methods.
数据在训练当代人工智能模型中起着至关重要的作用,但现有的许多公共数据将在几年内耗尽,这使得全球的注意力转向了大量分散的私人数据。然而,原始数据对隐私敏感的特性以及缺乏激励机制,阻碍了这些宝贵数据的充分利用。在此,我们提出了包容性和激励性的个性化联邦学习(iPFL),它激励具有不同目的的数据持有者在不泄露原始数据的情况下协作训练个性化模型。iPFL通过解决基于图的训练优化问题构建了一个模型共享市场,并纳入了基于博弈论原则的激励机制。理论分析表明,iPFL具有两个关键的激励特性:个体理性和激励相容性。对十一项人工智能任务(例如,大语言模型的指令跟随任务)的实证研究表明,iPFL始终能实现最高的经济效用,并且与基线方法相比,模型性能更好或相当。