Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland.
Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad023.
Taxonomically restricted genes (TRGs) are unique for a defined group of organisms and may act as potential genetic determinants of lineage-specific, biological properties. Here, we explore the TRGs of highly diverse and economically important Bacillus bacteria by examining commonly used TRG identification parameters and data sources. We show the significant effects of sequence similarity thresholds, composition, and the size of the reference database in the identification process. Subsequently, we applied stringent TRG search parameters and expanded the identification procedure by incorporating an analysis of noncoding and non-syntenic regions of non-Bacillus genomes. A multiplex annotation procedure minimized the number of false-positive TRG predictions and showed nearly one-third of the alleged TRGs could be mapped to genes missed in genome annotations. We traced the putative origin of TRGs by identifying homologous, noncoding genomic regions in non-Bacillus species and detected sequence changes that could transform these regions into protein-coding genes. In addition, our analysis indicated that Bacillus TRGs represent a specific group of genes mostly showing intermediate sequence properties between genes that are conserved across multiple taxa and nonannotated peptides encoded by open reading frames.
分类限制基因(TRGs)是特定生物群体所特有的,可能作为谱系特异性、生物学特性的潜在遗传决定因素。在这里,我们通过检查常用的 TRG 识别参数和数据源来探索高度多样化和具有经济重要性的芽孢杆菌的 TRGs。我们展示了序列相似性阈值、组成和参考数据库大小在识别过程中的显著影响。随后,我们应用了严格的 TRG 搜索参数,并通过分析非芽孢杆菌基因组的非编码和非同源区域扩展了识别过程。多重注释程序最大限度地减少了假阳性 TRG 预测的数量,并表明声称的 TRGs 中有近三分之一可以映射到基因组注释中遗漏的基因。我们通过识别非芽孢杆菌物种中的同源非编码基因组区域来追踪假定的 TRG 起源,并检测到可能将这些区域转化为蛋白质编码基因的序列变化。此外,我们的分析表明,芽孢杆菌 TRGs 代表了一组特定的基因,它们的序列特性大多介于在多个分类群中保守的基因和由开放阅读框编码的非注释肽之间。