Department of Medicine, Huddinge, Karolinska Institutet, Huddinge, Sweden.
Department of Veterinary Biosciences, University of Helsinki, 00014, Helsinki, Finland.
Nat Commun. 2024 Oct 21;15(1):9082. doi: 10.1038/s41467-024-52798-1.
The dog, Canis lupus familiaris, is an important model for studying human diseases. Unlike many model organisms, the dog genome has a comparatively poor functional annotation, which hampers gene discovery for development, morphology, disease, and behavior. To fill this gap, we established a comprehensive tissue biobank for both the dog and wolf samples. The biobank consists of 5485 samples representing 132 tissues from 13 dogs, 12 dog embryos, and 24 wolves. In a subset of 100 tissues from nine dogs and 12 embryos, we characterized gene expression activity for each promoter, including alternative and novel, i.e., previously not annotated, promoter regions, using the 5' targeting RNA sequencing technology STRT2-seq. We identified over 100,000 promoter region candidates in the recent canine genome assembly, CanFam4, including over 45,000 highly reproducible sites with gene expression and respective tissue enrichment levels. We provide a promoter and gene expression atlas with interactive, open data resources, including a data coordination center and genome browser track hubs. We demonstrated the applicability of Dog Genome Annotation (DoGA) data and resources using multiple examples spanning canine embryonic development, morphology and behavior, and diseases across species.
犬(Canis lupus familiaris)是研究人类疾病的重要模型。与许多模式生物不同,犬基因组的功能注释相对较差,这阻碍了发育、形态、疾病和行为相关基因的发现。为了填补这一空白,我们建立了一个综合性的犬和狼组织生物样本库。该生物样本库包含了 13 只犬、12 只犬胚胎和 24 只狼的 5485 个样本,涵盖了 132 种组织。在来自 9 只犬和 12 只胚胎的 100 种组织的子集中,我们使用 5'靶向 RNA 测序技术 STRT2-seq 对每个启动子(包括替代和新的,即以前未注释的启动子区域)的基因表达活性进行了特征描述。我们在最近的犬基因组组装 CanFam4 中鉴定出了超过 100000 个启动子区域候选者,其中包括超过 45000 个具有高度重现性的基因表达和相应组织富集水平的位点。我们提供了一个具有交互和开放数据资源的启动子和基因表达图谱,包括一个数据协调中心和基因组浏览器跟踪集线器。我们通过多个例子展示了 Dog Genome Annotation (DoGA) 数据和资源的适用性,这些例子涵盖了犬胚胎发育、形态和行为以及跨物种的疾病。