Xu Tong, Huang Danyang, Huang Tingting, Wang Yuxin, Chen Wanqiu, Chen Shijunyin, Qian Yurong, Yue Haitao
School of Life Science and Technology, Xinjiang University, 830017, Urumqi, China.
School of Pharmaceutical Sciences and Institute of Materia Medica, Xinjiang University, Urumqi, 830017, China.
Synth Syst Biotechnol. 2025 Jul 21;10(4):1377-1387. doi: 10.1016/j.synbio.2025.07.006. eCollection 2025 Dec.
Microbial community studies have established enzymes' pivotal catalytic roles in ecosystem metabolism, yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources. Metagenomics overcomes this by directly accessing microbial genetic information, but its massive data generation challenges precise enzyme identification: (1) Restricted applicability across varied sample types. (2) Narrow functional scope in target enzyme discovery. To address this, we developed Gene Surfing, a bioinformatics workflow platform based on Snakemake. It integrates modules for data quality control (Fastp), genome assembly (MEGAHIT), assembly evaluation (QUAST and MetaQUAST), functional annotation (Prokka), and homologous sequence retrieval (MMseqs2). Gene Surfing offers scalability, reproducibility, and efficiency, addressing key challenges in enzyme identification. Validation results include: Cellulose-degrading enzymes (GH5 family): 1,311,316 potential lignocellulolytic enzyme sequences were identified, with 127 sequences functionally validated (84.25 % activity rate); Polyethylene-degrading enzymes: 705 candidate sequences were found, 38 of which were heterologously expressed, showing an 81.5 % activity rate (31/38); Endonucleases (HNH superfamily): 585 potential sequences were retrieved, with 4 out of 7 tested showing activity (57.1 % success rate).
微生物群落研究已证实酶在生态系统代谢中具有关键的催化作用,但依赖培养的方法无法利用未培养微生物的酶资源。宏基因组学通过直接获取微生物遗传信息克服了这一问题,但其大量的数据生成给精确的酶鉴定带来了挑战:(1)在不同样本类型中的适用性受限。(2)在目标酶发现方面功能范围狭窄。为解决这一问题,我们开发了Gene Surfing,这是一个基于Snakemake的生物信息学工作流程平台。它集成了数据质量控制(Fastp)、基因组组装(MEGAHIT)、组装评估(QUAST和MetaQUAST)、功能注释(Prokka)和同源序列检索(MMseqs2)等模块。Gene Surfing具有可扩展性、可重复性和高效性,解决了酶鉴定中的关键挑战。验证结果包括:纤维素降解酶(GH5家族):鉴定出1,311,316个潜在的木质纤维素分解酶序列,其中127个序列经功能验证(活性率为84.25%);聚乙烯降解酶:发现705个候选序列,其中38个进行了异源表达,活性率为81.5%(31/38);核酸内切酶(HNH超家族):检索到585个潜在序列,7个测试序列中有4个显示出活性(成功率为57.1%)。