Department of Microbiology, University of Manitoba, Winnipeg, MB, Canada.
Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada.
Sci Rep. 2022 Jan 19;12(1):962. doi: 10.1038/s41598-022-05028-x.
A first clue to gene function can be obtained by examining whether a gene is required for life in certain standard conditions, that is, whether a gene is essential. In bacteria, essential genes are usually identified by high-density transposon mutagenesis followed by sequencing of insertion sites (Tn-seq). These studies assign the term "essential" to whole genes rather than the protein domain sequences that encode the essential functions. However, genes can code for multiple protein domains that evolve their functions independently. Therefore, when essential genes code for more than one protein domain, only one of them could be essential. In this study, we defined this subset of genes as "essential domain-containing" (EDC) genes. Using a Tn-seq data set built-in Burkholderia cenocepacia K56-2, we developed an in silico pipeline to identify EDC genes and the essential protein domains they encode. We found forty candidate EDC genes and demonstrated growth defect phenotypes using CRISPR interference (CRISPRi). This analysis included two knockdowns of genes encoding the protein domains of unknown function DUF2213 and DUF4148. These putative essential domains are conserved in more than two hundred bacterial species, including human and plant pathogens. Together, our study suggests that essentiality should be assigned to individual protein domains rather than genes, contributing to a first functional characterization of protein domains of unknown function.
一个基因是否在某些标准条件下生存所必需,可以作为其功能的第一个线索,也就是说,该基因是否为必需基因。在细菌中,通常通过高密度转座子诱变和插入位点测序(Tn-seq)来鉴定必需基因。这些研究将“必需”一词赋予整个基因,而不是赋予编码必需功能的蛋白质结构域序列。然而,一个基因可以编码多个独立进化其功能的蛋白质结构域。因此,当必需基因编码不止一个蛋白质结构域时,只有其中一个才是必需的。在这项研究中,我们将这类基因定义为“必需结构域包含”(EDC)基因。我们利用在伯克霍尔德氏菌中建立的 Tn-seq 数据集,开发了一种计算管道,用于鉴定 EDC 基因及其编码的必需蛋白质结构域。我们发现了四十个候选 EDC 基因,并使用 CRISPR 干扰(CRISPRi)证明了它们的生长缺陷表型。这项分析包括对编码未知功能结构域 DUF2213 和 DUF4148 蛋白质结构域的两个基因进行敲低。这些假定的必需结构域在两百多种细菌物种中是保守的,包括人类和植物病原体。总之,我们的研究表明,应该将必需性赋予单个蛋白质结构域,而不是基因,这有助于对未知功能蛋白质结构域进行首次功能特征分析。