Jiang Renjing, Yue Zhenrui, Shang Lanyu, Wang Dong, Wei Na
Department of Civil and Environmental Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States.
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, 61820, United States.
Metab Eng Commun. 2024 Sep 5;19:e00248. doi: 10.1016/j.mec.2024.e00248. eCollection 2024 Dec.
Plastic waste has caused a global environmental crisis. Biocatalytic depolymerization mediated by enzymes has emerged as an efficient and sustainable alternative for plastic treatment and recycling. However, it is challenging and time-consuming to discover novel plastic-degrading enzymes using conventional cultivation-based or omics methods. There is a growing interest in developing effective computational methods to identify new enzymes with desirable plastic degradation functionalities by exploring the ever-increasing databases of protein sequences. In this study, we designed an innovative machine learning-based framework, named PEZy-Miner, to mine for enzymes with high potential in degrading plastics of interest. Two datasets integrating information from experimentally verified enzymes and homologs with unknown plastic-degrading activity were created respectively, covering eleven types of plastic substrates. Protein language models and binary classification models were developed to predict enzymatic degradation of plastics along with confidence and uncertainty estimation. PEZy-Miner exhibited high prediction accuracy and stability when validated on experimentally verified enzymes. Furthermore, by masking the experimentally verified enzymes and blending them into homolog dataset, PEZy-Miner effectively concentrated the experimentally verified entries by 14∼30 times while shortlisting promising plastic-degrading enzyme candidates. We applied PEZy-Miner to 0.1 million putative sequences, out of which 27 new sequences were identified with high confidence. This study provided a new computational tool for mining and recommending promising new plastic-degrading enzymes.
塑料垃圾已引发全球环境危机。由酶介导的生物催化解聚已成为一种高效且可持续的塑料处理与回收替代方案。然而,使用传统的基于培养或组学方法发现新型塑料降解酶具有挑战性且耗时。通过探索不断增长的蛋白质序列数据库来开发有效的计算方法以识别具有理想塑料降解功能的新酶,这一兴趣正与日俱增。在本研究中,我们设计了一个基于机器学习的创新框架,名为PEZy-Miner,用于挖掘对感兴趣的塑料具有高降解潜力的酶。分别创建了两个数据集,整合了来自经实验验证的酶及其具有未知塑料降解活性的同源物的信息,涵盖了十一种塑料底物。开发了蛋白质语言模型和二元分类模型来预测塑料的酶促降解以及置信度和不确定性估计。在经实验验证的酶上进行验证时,PEZy-Miner表现出高预测准确性和稳定性。此外,通过掩盖经实验验证的酶并将它们混入同源数据集,PEZy-Miner有效地将经实验验证的条目集中了14至30倍,同时筛选出有前景的塑料降解酶候选物。我们将PEZy-Miner应用于10万个假定序列,从中高置信度地鉴定出27个新序列。本研究为挖掘和推荐有前景的新型塑料降解酶提供了一种新的计算工具。