Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
School of Natural Sciences, University of California Merced, Merced, California, USA.
Biotechnol Bioeng. 2021 Feb;118(2):759-769. doi: 10.1002/bit.27608. Epub 2020 Nov 14.
Growing industrial utilization of enzymes and the increasing availability of metagenomic data highlight the demand for effective methods of targeted identification and verification of novel enzymes from various environmental microbiota. Xylanases are a class of enzymes with numerous industrial applications and are involved in the degradation of xylose, a component of lignocellulose. The optimum temperature of enzymes is an essential factor to be considered when choosing appropriate biocatalysts for a particular purpose. Therefore, in silico prediction of this attribute is a significant cost and time-effective step in the effort to characterize novel enzymes. The objective of this study was to develop a computational method to predict the thermal dependence of xylanases. This tool was then implemented for targeted screening of putative xylanases with specific thermal dependencies from metagenomic data and resulted in the identification of three novel xylanases from sheep and cow rumen microbiota. Here we present thermal activity prediction for xylanase, a new sequence-based machine learning method that has been trained using a selected combination of various protein features. This random forest classifier discriminates non-thermophilic, thermophilic, and hyper-thermophilic xylanases. The model's performance was evaluated through multiple iterations of sixfold cross-validations as well as holdout tests, and it is freely accessible as a web-service at arimees.com.
酶的工业应用日益增多,宏基因组数据的可用性不断提高,这凸显了从各种环境微生物群中靶向鉴定和验证新型酶的有效方法的需求。木聚糖酶是一类具有多种工业应用的酶,参与木聚糖的降解,木聚糖是木质纤维素的组成部分。酶的最适温度是选择特定用途的合适生物催化剂时需要考虑的一个重要因素。因此,在计算机上预测该属性是对新型酶进行特征描述的一个重要的具有成本效益和时间效益的步骤。本研究的目的是开发一种预测木聚糖酶热依赖性的计算方法。然后,该工具被用于从宏基因组数据中靶向筛选具有特定热依赖性的假定木聚糖酶,结果从绵羊和牛瘤胃微生物群中鉴定出三种新型木聚糖酶。在这里,我们展示了木聚糖酶的热活性预测,这是一种新的基于序列的机器学习方法,该方法使用各种蛋白质特征的选定组合进行了训练。这个随机森林分类器可以区分非嗜热、嗜热和超嗜热木聚糖酶。该模型的性能通过六重交叉验证和保留测试的多次迭代进行了评估,并作为网络服务在 arimees.com 上免费提供。