Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia.
Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, VIC 3800, Australia.
Int J Mol Sci. 2023 Aug 1;24(15):12320. doi: 10.3390/ijms241512320.
Biodiversity within the animal kingdom is associated with extensive molecular diversity. The expansion of genomic, transcriptomic and proteomic data sets for invertebrate groups and species with unique biological traits necessitates reliable in silico tools for the accurate identification and annotation of molecules and molecular groups. However, conventional tools are inadequate for lesser-known organismal groups, such as eukaryotic pathogens (parasites), so that improved approaches are urgently needed. Here, we established a combined sequence- and structure-based workflow system to harness well-curated publicly available data sets and resources to identify, classify and annotate proteases and protease inhibitors of a highly pathogenic parasitic roundworm (nematode) of global relevance, called (barber's pole worm). This workflow performed markedly better than conventional, sequence-based classification and annotation alone and allowed the first genome-wide characterisation of protease and protease inhibitor genes and gene products in this worm. In total, we identified 790 genes encoding 860 proteases and protease inhibitors representing 83 gene families. The proteins inferred included 280 metallo-, 145 cysteine, 142 serine, 121 aspartic and 81 "mixed" proteases as well as 91 protease inhibitors, all of which had marked physicochemical diversity and inferred involvements in >400 biological processes or pathways. A detailed investigation revealed a remarkable expansion of some protease or inhibitor gene families, which are likely linked to parasitism (e.g., host-parasite interactions, immunomodulation and blood-feeding) and exhibit stage- or sex-specific transcription profiles. This investigation provides a solid foundation for detailed explorations of the structures and functions of proteases and protease inhibitors of and related nematodes, and it could assist in the discovery of new drug or vaccine targets against infections or diseases.
动物王国的生物多样性与广泛的分子多样性相关。基因组、转录组和蛋白质组数据集的扩展,涵盖了具有独特生物学特征的无脊椎动物群体和物种,这就需要可靠的计算机工具来准确识别和注释分子和分子群体。然而,传统的工具对于不太知名的生物群体(如真核病原体(寄生虫))来说是不够的,因此迫切需要改进的方法。在这里,我们建立了一个组合的基于序列和结构的工作流程系统,利用精心整理的公开数据集和资源来识别、分类和注释具有全球相关性的高度致病性寄生虫(线虫)的蛋白酶和蛋白酶抑制剂,这种寄生虫叫做 (旋毛虫)。与传统的仅基于序列的分类和注释相比,该工作流程的性能明显更好,并允许首次对这种蠕虫中的蛋白酶和蛋白酶抑制剂基因和基因产物进行全基因组特征描述。总共,我们鉴定了 790 个基因,编码了 860 种蛋白酶和蛋白酶抑制剂,代表了 83 个基因家族。推断出的蛋白质包括 280 种金属蛋白酶、145 种半胱氨酸蛋白酶、142 种丝氨酸蛋白酶、121 种天冬氨酸蛋白酶和 81 种“混合”蛋白酶以及 91 种蛋白酶抑制剂,所有这些蛋白质都具有显著的物理化学多样性,并推断参与了 >400 种生物学过程或途径。详细的研究揭示了一些蛋白酶或抑制剂基因家族的显著扩张,这些基因家族可能与寄生虫有关(例如,宿主-寄生虫相互作用、免疫调节和吸血),并表现出阶段或性别特异性的转录特征。这项研究为旋毛虫及其相关线虫的蛋白酶和蛋白酶抑制剂的结构和功能的详细探索提供了坚实的基础,并且可能有助于发现针对感染或疾病的新药物或疫苗靶点。