State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing (SCICB), East China University of Science and Technology, Shanghai 200237, China.
School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
J Phys Chem B. 2024 Mar 14;128(10):2281-2292. doi: 10.1021/acs.jpcb.3c06526. Epub 2024 Mar 4.
Accurate prediction of enzyme optimal temperature (Topt) is crucial for identifying enzymes suitable for catalytic functions under extreme bioprocessing conditions. The optimal growth temperature (OGT) of microorganisms serves as a key indicator for estimating enzyme Topt, reflecting an evolutionary temperature balance between enzyme-catalyzed reactions and the organism's growth environments. Existing OGT databases, collected from culture collection centers, often fall short as culture temperature does not precisely represent the OGT. Models trained on such databases yield inadequate accuracy in enzyme Topt prediction, underscoring the need for a high-quality OGT database. Herein, we developed AI-based models to extract the OGT information from the scientific literature, constructing a comprehensive OGT database with 1155 unique organisms and 2142 OGT values. The top-performing model, BioLinkBERT, demonstrated exceptional information extraction ability with an EM score of 91.00 and an F1 score of 91.91 for OGT. Notably, applying this OGT database in enzyme Topt prediction achieved an value of 0.698, outperforming the value of 0.686 obtained using culture temperature. This emphasizes the superiority of the OGT database in predicting the enzyme Topt and underscores its pivotal role in identifying enzymes with optimal catalytic temperatures.
准确预测酶的最适温度(Topt)对于识别在极端生物加工条件下适合催化功能的酶至关重要。微生物的最适生长温度(OGT)是估计酶 Topt 的关键指标,反映了酶催化反应与生物体生长环境之间的进化温度平衡。从培养物收集中心收集的现有 OGT 数据库往往不完整,因为培养温度不能准确代表 OGT。在这些数据库上训练的模型在酶 Topt 预测方面的准确性不足,这突显了对高质量 OGT 数据库的需求。在这里,我们开发了基于人工智能的模型,从科学文献中提取 OGT 信息,构建了一个包含 1155 个独特生物体和 2142 个 OGT 值的综合 OGT 数据库。表现最佳的模型 BioLinkBERT 表现出出色的信息提取能力,OGT 的 EM 得分为 91.00,F1 得分为 91.91。值得注意的是,在酶 Topt 预测中应用此 OGT 数据库可达到 0.698 的 值,优于使用培养温度获得的 0.686 的 值。这强调了 OGT 数据库在预测酶 Topt 方面的优越性,并突出了其在识别具有最佳催化温度的酶方面的关键作用。