Zhou Jessica, Rizzo Kaeli, Tang Ziqi, Koo Peter K
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA.
Currently at InstaDeep, Cambridge, MA, USA.
bioRxiv. 2024 Nov 15:2024.11.13.623485. doi: 10.1101/2024.11.13.623485.
Deep neural networks (DNNs) have advanced predictive modeling for regulatory genomics, but challenges remain in ensuring the reliability of their predictions and understanding the key factors behind their decision making. Here we introduce DEGU (Distilling Ensembles for Genomic Uncertainty-aware models), a method that integrates ensemble learning and knowledge distillation to improve the robustness and explainability of DNN predictions. DEGU distills the predictions of an ensemble of DNNs into a single model, capturing both the average of the ensemble's predictions and the variability across them, with the latter representing epistemic (or model-based) uncertainty. DEGU also includes an optional auxiliary task to estimate aleatoric, or data-based, uncertainty by modeling variability across experimental replicates. By applying DEGU across various functional genomic prediction tasks, we demonstrate that DEGU-trained models inherit the performance benefits of ensembles in a single model, with improved generalization to out-of-distribution sequences and more consistent explanations of cis-regulatory mechanisms through attribution analysis. Moreover, DEGU-trained models provide calibrated uncertainty estimates, with conformal prediction offering coverage guarantees under minimal assumptions. Overall, DEGU paves the way for robust and trustworthy applications of deep learning in genomics research.
深度神经网络(DNN)推动了调控基因组学的预测建模,但在确保其预测的可靠性以及理解其决策背后的关键因素方面仍存在挑战。在此,我们介绍了DEGU(用于基因组不确定性感知模型的集成蒸馏),这是一种集成了集成学习和知识蒸馏的方法,用于提高DNN预测的稳健性和可解释性。DEGU将一组DNN的预测蒸馏到一个单一模型中,既捕捉了集成预测的平均值,也捕捉了它们之间的变异性,后者代表认知(或基于模型的)不确定性。DEGU还包括一个可选的辅助任务,通过对实验重复中的变异性进行建模来估计偶然的或基于数据的不确定性。通过将DEGU应用于各种功能基因组预测任务,我们证明,经过DEGU训练的模型在单个模型中继承了集成的性能优势,对分布外序列具有更好的泛化能力,并且通过归因分析对顺式调控机制有更一致的解释。此外,经过DEGU训练的模型提供了校准的不确定性估计,共形预测在最小假设下提供覆盖保证。总体而言,DEGU为深度学习在基因组学研究中的稳健和可靠应用铺平了道路。