Sheshanarayana Rahul, You Fengqi
College of Engineering, Cornell University, Ithaca, NY, 14853, USA.
Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, 14853, USA.
Adv Sci (Weinh). 2025 Jun;12(22):e2503271. doi: 10.1002/advs.202503271. Epub 2025 Apr 9.
Knowledge distillation (KD) is a powerful model compression technique that transfers knowledge from complex teacher models to compact student models, reducing computational costs while preserving predictive accuracy. This study investigated KD's efficacy in molecular property prediction across domain-specific and cross-domain tasks, leveraging state-of-the-art graph neural networks (SchNet, DimeNet++, and TensorNet). In the domain-specific setting, KD improved regression performance across diverse quantum mechanical properties in the QM9 dataset, with DimeNet++ student models achieving up to an 90% improvement in compared to non-KD baselines. Notably, in certain cases, smaller student models achieved comparable or even superior improvements while being 2× smaller, highlighting KD's ability to enhance efficiency without sacrificing predictive performance. Cross-domain evaluations further demonstrated KD's adaptability, where embeddings from QM9-trained teacher models enhanced predictions for ESOL (logS) and FreeSolv (ΔG), with SchNet exhibiting the highest gains of ≈65% in logS predictions. Embedding analysis revealed substantial student-teacher alignment gains, with the relative shift in cosine similarity distribution peaks reaching up to 1.0 across student models. These findings highlighted KD as a robust strategy for enhancing molecular representation learning, with implications for cheminformatics, materials science, and drug discovery.
知识蒸馏(KD)是一种强大的模型压缩技术,它将知识从复杂的教师模型转移到紧凑的学生模型,在保持预测准确性的同时降低计算成本。本研究利用先进的图神经网络(SchNet、DimeNet++和TensorNet),研究了KD在特定领域和跨领域任务的分子性质预测中的有效性。在特定领域设置中,KD提高了QM9数据集中各种量子力学性质的回归性能,与非KD基线相比,DimeNet++学生模型在[具体指标]上实现了高达90%的提升。值得注意的是,在某些情况下,较小的学生模型在小2倍的情况下实现了相当甚至更好的[具体指标]提升,突出了KD在不牺牲预测性能的情况下提高效率的能力。跨领域评估进一步证明了KD的适应性,其中来自QM9训练的教师模型的嵌入增强了对ESOL(logS)和FreeSolv(ΔG)的预测,SchNet在logS预测中表现出最高约65%的提升。嵌入分析揭示了学生-教师对齐的显著提升,学生模型的余弦相似度分布峰值的相对偏移高达1.0。这些发现突出了KD作为增强分子表示学习的稳健策略,对化学信息学、材料科学和药物发现具有重要意义。