US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Centre for Structural and Functional Genomics, Concordia University, Montreal, QC, Canada.
Commun Biol. 2024 Sep 12;7(1):1124. doi: 10.1038/s42003-024-06681-w.
Thermophily is a trait scattered across the fungal tree of life, with its highest prevalence within three fungal families (Chaetomiaceae, Thermoascaceae, and Trichocomaceae), as well as some members of the phylum Mucoromycota. We examined 37 thermophilic and thermotolerant species and 42 mesophilic species for this study and identified thermophily as the ancestral state of all three prominent families of thermophilic fungi. Thermophilic fungal genomes were found to encode various thermostable enzymes, including carbohydrate-active enzymes such as endoxylanases, which are useful for many industrial applications. At the same time, the overall gene counts, especially in gene families responsible for microbial defense such as secondary metabolism, are reduced in thermophiles compared to mesophiles. We also found a reduction in the core genome size of thermophiles in both the Chaetomiaceae family and the Eurotiomycetes class. The Gene Ontology terms lost in thermophilic fungi include primary metabolism, transporters, UV response, and O-methyltransferases. Comparative genomics analysis also revealed higher GC content in the third base of codons (GC3) and a lower effective number of codons in fungal thermophiles than in both thermotolerant and mesophilic fungi. Furthermore, using the Support Vector Machine classifier, we identified several Pfam domains capable of discriminating between genomes of thermophiles and mesophiles with 94% accuracy. Using AlphaFold2 to predict protein structures of endoxylanases (GH10), we built a similarity network based on the structures. We found that the number of disulfide bonds appears important for protein structure, and the network clusters based on protein structures correlate with the optimal activity temperature. Thus, comparative genomics offers new insights into the biology, adaptation, and evolutionary history of thermophilic fungi while providing a parts list for bioengineering applications.
嗜热菌是真菌生命之树上广泛存在的一种特征,在三个真菌科(Chaetomiaceae、Thermoascaceae 和 Trichocomaceae)以及一些 Mucoromycota 门的成员中最为普遍。我们在这项研究中检查了 37 种嗜热和耐热物种以及 42 种中温物种,并将嗜热性鉴定为所有三种主要的嗜热真菌家族的原始状态。嗜热真菌基因组编码各种耐热酶,包括内切木聚糖酶等碳水化合物活性酶,这些酶在许多工业应用中非常有用。同时,与中温物种相比,嗜热物种的总基因数,特别是负责微生物防御的基因家族(如次生代谢物)的基因数减少。我们还发现,无论是在 Chaetomiaceae 科还是 Eurotiomycetes 纲,嗜热菌的核心基因组大小都有所减少。在嗜热真菌中丢失的基因本体论(GO)术语包括初级代谢物、转运蛋白、UV 反应和 O-甲基转移酶。比较基因组学分析还揭示了嗜热真菌的密码子第三碱基(GC3)GC 含量较高,有效密码子数低于耐热和中温真菌。此外,我们使用支持向量机分类器,确定了几个 Pfam 结构域,能够以 94%的准确率区分嗜热菌和中温菌的基因组。使用 AlphaFold2 预测内切木聚糖酶(GH10)的蛋白质结构,我们根据结构构建了一个相似性网络。我们发现二硫键的数量对于蛋白质结构很重要,并且基于蛋白质结构的网络聚类与最佳活性温度相关。因此,比较基因组学为嗜热真菌的生物学、适应和进化历史提供了新的见解,同时为生物工程应用提供了一个零件清单。