Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA.
College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang, Jiangxi, People's Republic of China.
BMC Genomics. 2019 May 7;20(1):344. doi: 10.1186/s12864-019-5709-y.
Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig.
Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were identified; an increase over current EBI (1.9 tpg) and NCBI (2.9 tpg) annotations and closer to the number reported in human genome (4.2 tpg). Our new pig genome annotation extended more than 6000 known gene borders (5' end extension, 3' end extension, or both) compared to EBI or NCBI annotations. We validated a large proportion of these extensions by independent pig poly(A) selected 3'-RNA-seq data, or human FANTOM5 Cap Analysis of Gene Expression data. Further, we detected 10,465 novel genes (81% non-coding) not reported in current pig genome annotations. More than 80% of these novel genes had transcripts detected in > 1 tissue. In addition, more than 80% of novel intergenic genes with at least one transcript detected in liver tissue had H3K4me3 or H3K36me3 peaks mapping to their promoter and gene body, respectively, in independent liver chromatin immunoprecipitation data.
These validated results show significant improvement over current pig genome annotations.
我们对猪转录组的了解有限。我们使用单分子长读长转录本测序(Iso-seq)和 Illumina RNA 测序(RNA-seq)技术,从一只白色杂交猪的 9 种组织中评估了多聚(A)选择的单分子长读长转录本的多样性。
在所有组织中,共观察到 67746 个独特的转录本,包括 60.5%预测的蛋白质编码、36.2%长非编码 RNA 和 3.3%无意义介导的降解转录本。平均而言,组织内的 RNA-seq 支持 90%的剪接连接。大部分(80%)是新的转录本,主要由已知的蛋白质编码基因产生(70%),而 17%对应于新的基因。平均每个已知基因(tpg)有 4 个转录本;这一数字高于 EBI(1.9 tpg)和 NCBI(2.9 tpg)的注释,更接近人类基因组(4.2 tpg)的报道。与 EBI 或 NCBI 注释相比,我们新的猪基因组注释扩展了超过 6000 个已知基因边界(5'端延伸、3'端延伸或两者)。我们通过独立的猪多聚(A)选择 3'-RNA-seq 数据或人类 FANTOM5 帽分析基因表达数据验证了这些扩展的很大一部分。此外,我们在当前猪基因组注释中检测到 10465 个新基因(81%是非编码的)。这些新基因中,超过 80%的基因在超过 1 种组织中检测到转录本。此外,在肝脏组织中检测到转录本的超过 80%的新基因间基因,其启动子和基因体分别有至少一个转录本,在独立的肝脏染色质免疫沉淀数据中,分别有 H3K4me3 或 H3K36me3 峰映射到其启动子和基因体。
这些验证结果表明,与当前的猪基因组注释相比,有了显著的改进。