Yang Yuxin, Jerger Abby, Feng Song, Wang Zixu, Brasfield Christina, Cheung Margaret S, Zucker Jeremy, Guan Qiang
Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave, Cleveland, OH, 44195, USA.
Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave, Cleveland, OH, 44195, USA.
Commun Biol. 2024 Dec 23;7(1):1690. doi: 10.1038/s42003-024-07359-z.
Recent years have witnessed the remarkable progress of deep learning within the realm of scientific disciplines, yielding a wealth of promising outcomes. A prominent challenge within this domain has been the task of predicting enzyme function, a complex problem that has seen the development of numerous computational methods, particularly those rooted in deep learning techniques. However, the majority of these methods have primarily focused on either amino acid sequence data or protein structure data, neglecting the potential synergy of combining both modalities. To address this gap, we propose a Contrastive Learning framework for Enzyme functional ANnotation prediction combined with protein amino acid sequences and Contact maps (CLEAN-Contact). We rigorously evaluate the performance of our CLEAN-Contact framework against the state-of-the-art enzyme function prediction models using multiple benchmark datasets. Using CLEAN-Contact, we predict previously unknown enzyme functions within the proteome of Prochlorococcus marinus MED4. Our findings convincingly demonstrate the substantial superiority of our CLEAN-Contact framework, marking a significant step forward in enzyme function prediction accuracy.
近年来,深度学习在科学学科领域取得了显著进展,产生了大量有前景的成果。该领域一个突出的挑战是预测酶功能的任务,这是一个复杂的问题,已经有许多计算方法得到发展,特别是那些基于深度学习技术的方法。然而,这些方法大多数主要集中在氨基酸序列数据或蛋白质结构数据上,忽略了结合这两种模态的潜在协同作用。为了弥补这一差距,我们提出了一种用于酶功能注释预测的对比学习框架,该框架结合了蛋白质氨基酸序列和接触图(CLEAN-Contact)。我们使用多个基准数据集,严格评估了我们的CLEAN-Contact框架相对于当前最先进的酶功能预测模型的性能。使用CLEAN-Contact,我们预测了海洋原绿球藻MED4蛋白质组中以前未知的酶功能。我们的研究结果令人信服地证明了我们的CLEAN-Contact框架的显著优越性,标志着酶功能预测准确性向前迈出了重要一步。