Department of Biotechnology, Hankyong National University, Anseong, Republic of Korea.
BMC Genomics. 2012 Sep 12;13:473. doi: 10.1186/1471-2164-13-473.
Thoroughbred horses are the most expensive domestic animals, and their running ability and knowledge about their muscle-related diseases are important in animal genetics. While the horse reference genome is available, there has been no large-scale functional annotation of the genome using expressed genes derived from transcriptomes.
We present a large-scale analysis of whole transcriptome data. We sequenced the whole mRNA from the blood and muscle tissues of six thoroughbred horses before and after exercise. By comparing current genome annotations, we identified 32,361 unigene clusters spanning 51.83 Mb that contained 11,933 (36.87%) annotated genes. More than 60% (20,428) of the unigene clusters did not match any current equine gene model. We also identified 189,973 single nucleotide variations (SNVs) from the sequences aligned against the horse reference genome. Most SNVs (171,558 SNVs; 90.31%) were novel when compared with over 1.1 million equine SNPs from two SNP databases. Using differential expression analysis, we further identified a number of exercise-regulated genes: 62 up-regulated and 80 down-regulated genes in the blood, and 878 up-regulated and 285 down-regulated genes in the muscle. Six of 28 previously-known exercise-related genes were over-expressed in the muscle after exercise. Among the differentially expressed genes, there were 91 transcription factor-encoding genes, which included 56 functionally unknown transcription factor candidates that are probably associated with an early regulatory exercise mechanism. In addition, we found interesting RNA expression patterns where different alternative splicing forms of the same gene showed reversed expressions before and after exercising.
The first sequencing-based horse transcriptome data, extensive analyses results, deferentially expressed genes before and after exercise, and candidate genes that are related to the exercise are provided in this study.
纯血马是最昂贵的家畜,其奔跑能力和对肌肉相关疾病的了解在动物遗传学中很重要。虽然马的参考基因组已经存在,但尚未使用来自转录组的表达基因对基因组进行大规模的功能注释。
我们进行了大规模的全转录组数据分析。我们对 6 匹赛马在运动前后的血液和肌肉组织中的整个 mRNA 进行了测序。通过比较当前的基因组注释,我们鉴定了 32361 个包含 11933 个(36.87%)注释基因的基因簇,这些基因簇跨越 51.83Mb。超过 60%(20428 个)的基因簇与任何当前的马基因模型都不匹配。我们还从与马参考基因组比对的序列中鉴定了 189973 个单核苷酸变异(SNV)。与两个 SNP 数据库中超过 110 万匹的马 SNP 相比,大多数 SNV(171558 个 SNV;90.31%)是新的。通过差异表达分析,我们进一步鉴定了一些运动调节基因:血液中有 62 个上调和 80 个下调基因,肌肉中有 878 个上调和 285 个下调基因。运动后肌肉中 28 个已知的运动相关基因中有 6 个表达上调。在差异表达基因中,有 91 个转录因子编码基因,其中包括 56 个功能未知的转录因子候选基因,这些候选基因可能与早期的运动调节机制有关。此外,我们还发现了有趣的 RNA 表达模式,即同一基因的不同选择性剪接形式在运动前后表现出相反的表达。
本研究提供了首个基于测序的马转录组数据、广泛的分析结果、运动前后差异表达基因和与运动相关的候选基因。