PEGASE, INRAE, Institut Agro, 35590, Saint Gilles, France.
INRAE, BioinfOmics, GenoToul Bioinformatics facility, Sigenae, Université Fédérale de Toulouse, 31326, Castanet-Tolosan, France.
Sci Rep. 2024 Mar 19;14(1):6588. doi: 10.1038/s41598-024-56705-y.
Gene atlases for livestock are steadily improving thanks to new genome assemblies and new expression data improving the gene annotation. However, gene content varies across databases due to differences in RNA sequencing data and bioinformatics pipelines, especially for long non-coding RNAs (lncRNAs) which have higher tissue and developmental specificity and are harder to consistently identify compared to protein coding genes (PCGs). As done previously in 2020 for chicken assemblies galgal5 and GRCg6a, we provide a new gene atlas, lncRNA-enriched, for the latest GRCg7b chicken assembly, integrating "NCBI RefSeq", "EMBL-EBI Ensembl/GENCODE" reference annotations and other resources such as FAANG and NONCODE. As a result, the number of PCGs increases from 18,022 (RefSeq) and 17,007 (Ensembl) to 24,102, and that of lncRNAs from 5789 (RefSeq) and 11,944 (Ensembl) to 44,428. Using 1400 public RNA-seq transcriptome representing 47 tissues, we provided expression evidence for 35,257 (79%) lncRNAs and 22,468 (93%) PCGs, supporting the relevance of this atlas. Further characterization including tissue-specificity, sex-differential expression and gene configurations are provided. We also identified conserved miRNA-hosting genes with human counterparts, suggesting common function. The annotated atlas is available at gega.sigenae.org.
由于新的基因组组装和新的表达数据改善了基因注释,家畜的基因图谱正在稳步改进。然而,由于 RNA 测序数据和生物信息学管道的差异,数据库中的基因内容存在差异,尤其是长非编码 RNA(lncRNA),与蛋白编码基因(PCG)相比,lncRNA 具有更高的组织和发育特异性,并且更难一致识别。正如 2020 年在鸡组装 galgal5 和 GRCg6a 中所做的那样,我们为最新的 GRCg7b 鸡组装提供了一个新的基因图谱,lncRNA 富集,整合了“NCBI RefSeq”、“EMBL-EBI Ensembl/GENCODE”参考注释以及 FAANG 和 NONCODE 等其他资源。结果,PCG 的数量从 RefSeq 的 18022 个和 Ensembl 的 17007 个增加到 24102 个,lncRNA 的数量从 RefSeq 的 5789 个和 Ensembl 的 11944 个增加到 44428 个。使用代表 47 种组织的 1400 个公共 RNA-seq 转录组,我们为 35257 个(79%)lncRNA 和 22468 个(93%)PCG 提供了表达证据,支持了该图谱的相关性。进一步的特征描述包括组织特异性、性别差异表达和基因结构。我们还鉴定了具有人类对应物的保守 miRNA 宿主基因,表明它们具有共同的功能。注释图谱可在 gegasigenae.org 获得。