Zhu Wei, Luo Jiebo, White Andrew D
Department of Computer Science, University of Rochester, Rochester, NY, USA.
Department of Chemical Engineering, University of Rochester, Rochester, NY, USA.
Patterns (N Y). 2022 Jun 2;3(6):100521. doi: 10.1016/j.patter.2022.100521. eCollection 2022 Jun 10.
Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data.
化学研究进行实验的物质成本和计算成本都很高。直觉关注不同类别的分子,从而产生异质数据,而这些数据无法通过传统方法轻松合并。这项工作引入了联邦异构分子学习。联邦学习允许终端用户在保持其训练数据隔离的同时协作构建全局模型。我们首先通过对现有数据集联合执行支架拆分和潜在狄利克雷分配来模拟一个异构联邦学习基准(FedChem)。我们在FedChem上的结果表明,在处理跨客户端的异构分子时会出现重大的学习挑战。然后,我们提出了一种方法来缓解这个问题:实例重加权联邦学习(FLIT(+))。FLIT(+)可以对齐跨客户端的局部训练。在FedChem上进行的实验验证了该方法的优势。这项工作应该能够实现一种新型的合作,以改进化学领域的人工智能(AI),同时减轻对共享有价值化学数据的担忧。