Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China.
BMC Genomics. 2024 May 1;25(1):430. doi: 10.1186/s12864-024-10316-z.
Although multiple chicken genomes have been assembled and annotated, the numbers of protein-coding genes in chicken genomes and their variation among breeds are still uncertain due to the low quality of these genome assemblies and limited resources used in their gene annotations. To fill these gaps, we recently assembled genomes of four indigenous chicken breeds with distinct traits at chromosome-level. In this study, we annotated genes in each of these assembled genomes using a combination of RNA-seq- and homology-based approaches.
We identified varying numbers (17,497-17,718) of protein-coding genes in the four indigenous chicken genomes, while recovering 51 of the 274 "missing" genes in birds in general, and 36 of the 174 "missing" genes in chickens in particular. Intriguingly, based on deeply sequenced RNA-seq data collected in multiple tissues in the four breeds, we found 571 ~ 627 protein-coding genes in each genome, which were missing in the annotations of the reference chicken genomes (GRCg6a and GRCg7b/w). After removing redundancy, we ended up with a total of 1,420 newly annotated genes (NAGs). The NAGs tend to be found in subtelomeric regions of macro-chromosomes (chr1 to chr5, plus chrZ) and middle chromosomes (chr6 to chr13, plus chrW), as well as in micro-chromosomes (chr14 to chr39) and unplaced contigs, where G/C contents are high. Moreover, the NAGs have elevated quadruplexes G frequencies, while both G/C contents and quadruplexes G frequencies in their surrounding regions are also high. The NAGs showed tissue-specific expression, and we were able to verify 39 (92.9%) of 42 randomly selected ones in various tissues of the four chicken breeds using RT-qPCR experiments. Most of the NAGs were also encoded in the reference chicken genomes, thus, these genomes might harbor more genes than previously thought.
The NAGs are widely distributed in wild, indigenous and commercial chickens, and they might play critical roles in chicken physiology. Counting these new genes, chicken genomes harbor more genes than originally thought.
尽管已经组装和注释了多个鸡基因组,但由于这些基因组组装的质量较低,以及基因注释所用的资源有限,鸡基因组中的蛋白质编码基因数量及其在品种间的变异仍然不确定。为了填补这些空白,我们最近以染色体水平对四个具有不同特征的本土鸡品种进行了基因组组装。在这项研究中,我们使用 RNA-seq 和同源性方法相结合的方式对每个组装的基因组中的基因进行了注释。
我们在四个本土鸡基因组中鉴定出数量不同的(17497-17718)个蛋白质编码基因,同时在鸟类中普遍回收了 274 个“缺失”基因中的 51 个,在鸡中特别回收了 174 个“缺失”基因中的 36 个。有趣的是,基于在四个品种的多个组织中深度测序的 RNA-seq 数据,我们在每个基因组中发现了 571~627 个蛋白质编码基因,而这些基因在参考鸡基因组(GRCg6a 和 GRCg7b/w)的注释中缺失。去除冗余后,我们最终得到了总共 1420 个新注释的基因(NAGs)。NAGs 倾向于存在于大染色体(chr1 到 chr5,加上 chrZ)和中染色体(chr6 到 chr13,加上 chrW)的端粒区域以及微染色体(chr14 到 chr39)和未定位的连续体中,这些区域的 G/C 含量较高。此外,NAGs 中存在较高的四联体 G 频率,而其周围区域的 G/C 含量和四联体 G 频率也较高。NAGs 表现出组织特异性表达,我们能够通过 RT-qPCR 实验在四个鸡品种的各种组织中验证 42 个随机选择的 NAG 中的 39 个(92.9%)。大多数 NAGs 也在参考鸡基因组中编码,因此,这些基因组可能比以前认为的包含更多基因。
NAGs 在野生、本土和商业鸡中广泛分布,它们可能在鸡生理中发挥关键作用。算上这些新基因,鸡基因组中包含的基因比以前认为的要多。