Danek Benjamin P, Makarious Mary B, Dadu Anant, Vitale Dan, Lee Paul Suhwan, Singleton Andrew B, Nalls Mike A, Sun Jimeng, Faghri Faraz
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA.
Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
Patterns (N Y). 2024 Mar 1;5(3):100945. doi: 10.1016/j.patter.2024.100945. eCollection 2024 Mar 8.
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.
虽然机器学习(ML)研究近来越来越受欢迎,但其在组学领域的应用受到训练ML模型所需的足够大的高质量数据集获取的限制。联邦学习(FL)为参与机构之间协作管理此类数据集提供了契机。我们比较了在多组学帕金森病预测任务中,使用联邦学习训练的几个模型与传统训练的ML模型的模拟性能。我们发现,联邦学习模型的性能追踪集中训练的ML模型,其中性能最佳的联邦学习模型实现的AUC-PR为0.876±0.009,比其集中训练的变体低0.014±0.003。我们还确定,联盟内样本的分散在模型性能中起着重要作用。我们的研究实现了几个开源联邦学习框架,并旨在突出在多组学研究中应用这些协作方法时的一些挑战和机遇。