Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium.
Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium.
BMC Genomics. 2024 Jun 27;25(1):643. doi: 10.1186/s12864-024-10554-1.
The CBM13 family comprises carbohydrate-binding modules that occur mainly in enzymes and in several ricin-B lectins. The ricin-B lectin domain resembles the CBM13 module to a large extent. Historically, ricin-B lectins and CBM13 proteins were considered completely distinct, despite their structural and functional similarities.
In this data mining study, we investigate structural and functional similarities of these intertwined protein groups. Because of the high structural and functional similarities, and differences in nomenclature usage in several databases, confusion can arise. First, we demonstrate how public protein databases use different nomenclature systems to describe CBM13 modules and putative ricin-B lectin domains. We suggest the introduction of a novel CBM13 domain identifier, as well as the extension of CAZy cross-references in UniProt to guard the distinction between CAZy and non-CAZy entries in public databases. Since similar problems may occur with other lectin families and CBM families, we suggest the introduction of novel CBM InterPro domain identifiers to all existing CBM families. Second, we investigated phylogenetic, nomenclatural and structural similarities between putative ricin-B lectin domains and CBM13 modules, making use of sequence similarity networks. We concluded that the ricin-B/CBM13 superfamily may be larger than initially thought and that several putative ricin-B lectin domains may display CAZyme functionalities, although biochemical proof remains to be delivered.
Ricin-B lectin domains and CBM13 modules are associated groups of proteins whose database semantics are currently biased towards ricin-B lectins. Revision of the CAZy cross-reference in UniProt and introduction of a dedicated CBM13 domain identifier in InterPro may resolve this issue. In addition, our analyses show that several proteins with putative ricin-B lectin domains show very strong structural similarity to CBM13 modules. Therefore ricin-B lectin domains and CBM13 modules could be considered distant members of a larger ricin-B/CBM13 superfamily.
CBM13 家族包含主要存在于酶和几种蓖麻毒素 B 凝集素中的碳水化合物结合模块。蓖麻毒素 B 凝集素结构域在很大程度上与 CBM13 模块相似。历史上,尽管具有结构和功能相似性,但蓖麻毒素 B 凝集素和 CBM13 蛋白被认为是完全不同的。
在这项数据挖掘研究中,我们研究了这些交织蛋白组的结构和功能相似性。由于高结构和功能相似性以及在几个数据库中命名法使用的差异,可能会引起混淆。首先,我们展示了公共蛋白质数据库如何使用不同的命名系统来描述 CBM13 模块和推定的蓖麻毒素 B 凝集素结构域。我们建议引入一种新的 CBM13 结构域标识符,并扩展 UniProt 中的 CAZy 交叉引用,以保护公共数据库中 CAZy 和非 CAZy 条目之间的区别。由于其他凝集素家族和 CBM 家族也可能出现类似的问题,我们建议向所有现有的 CBM 家族引入新的 CBM InterPro 结构域标识符。其次,我们利用序列相似性网络研究了推定的蓖麻毒素 B 凝集素结构域和 CBM13 模块之间的系统发育、命名法和结构相似性。我们得出的结论是,蓖麻毒素 B/CBM13 超家族可能比最初想象的要大,并且几个推定的蓖麻毒素 B 凝集素结构域可能具有 CAZyme 功能,尽管仍需要进行生化验证。
蓖麻毒素 B 凝集素结构域和 CBM13 模块是相关的蛋白质组,其数据库语义目前偏向于蓖麻毒素 B 凝集素。修订 UniProt 中的 CAZy 交叉引用并在 InterPro 中引入专用的 CBM13 结构域标识符可以解决此问题。此外,我们的分析表明,具有推定的蓖麻毒素 B 凝集素结构域的几种蛋白质与 CBM13 模块具有很强的结构相似性。因此,蓖麻毒素 B 凝集素结构域和 CBM13 模块可以被视为更大的蓖麻毒素 B/CBM13 超家族的远亲成员。