College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, China.
Haihe Laboratory of Synthetic Biology, Tianjin 300308, China.
Int J Mol Sci. 2024 Apr 28;25(9):4803. doi: 10.3390/ijms25094803.
The molecular weight (MW) of an enzyme is a critical parameter in enzyme-constrained models (ecModels). It is determined by two factors: the presence of subunits and the abundance of each subunit. Although the number of subunits (NS) can potentially be obtained from UniProt, this information is not readily available for most proteins. In this study, we addressed this gap by extracting and curating subunit information from the UniProt database to establish a robust benchmark dataset. Subsequently, we propose a novel model named DeepSub, which leverages the protein language model and Bi-directional Gated Recurrent Unit (GRU), to predict NS in homo-oligomers solely based on protein sequences. DeepSub demonstrates remarkable accuracy, achieving an accuracy rate as high as 0.967, surpassing the performance of QUEEN. To validate the effectiveness of DeepSub, we performed predictions for protein homo-oligomers that have been reported in the literature but are not documented in the UniProt database. Examples include homoserine dehydrogenase from , Matrilin-4 from and , and the Multimerins protein family from and . The predicted results align closely with the reported findings in the literature, underscoring the reliability and utility of DeepSub.
酶的分子量 (MW) 是酶约束模型 (ecModel) 中的一个关键参数。它由两个因素决定:亚基的存在和每个亚基的丰度。虽然亚基的数量 (NS) 可以从 UniProt 中获得,但对于大多数蛋白质来说,这种信息并不容易获得。在这项研究中,我们通过从 UniProt 数据库中提取和整理亚基信息来解决这一差距,建立了一个稳健的基准数据集。随后,我们提出了一个名为 DeepSub 的新模型,该模型利用蛋白质语言模型和双向门控循环单元 (GRU),仅根据蛋白质序列预测同聚寡聚物中的 NS。DeepSub 表现出了显著的准确性,准确率高达 0.967,超过了 QUEEN 的性能。为了验证 DeepSub 的有效性,我们对文献中报道但 UniProt 数据库中未记录的蛋白质同聚寡聚物进行了预测。例如,来自 、Matrilin-4 来自 和 、Multimerins 蛋白家族来自 和 的同型丝氨酸脱氢酶。预测结果与文献中的报道结果密切吻合,突显了 DeepSub 的可靠性和实用性。