Suppr超能文献

猪外周血的高质量注释转录组。

A high-quality annotated transcriptome of swine peripheral blood.

作者信息

Liu Haibo, Smith Timothy P L, Nonneman Dan J, Dekkers Jack C M, Tuggle Christopher K

机构信息

Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, 2258 Kildee Hall, Ames, IA, 50011, USA.

USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA.

出版信息

BMC Genomics. 2017 Jun 24;18(1):479. doi: 10.1186/s12864-017-3863-7.

Abstract

BACKGROUND

High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly.

RESULTS

A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with 162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5' termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5' and 3' termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection.

CONCLUSIONS

We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome.

摘要

背景

外周血的高通量基因表达谱分析在生物医学以及动物遗传学和生理学研究中被广泛应用。对此类高通量分析进行准确、全面且精确的解读依赖于特征明确的参考基因组和/或转录组。然而,猪的参考基因组和外周血转录组均未得到充分的组装和注释,无法支持在这种新兴生物医学模式生物中进行此类分析。我们旨在通过将从头组装与基因组引导组装相结合,整合已发表的和新的RNA测序数据,为猪提供一个全面且注释良好的血液转录组。

结果

分别使用Trinity和Cufflinks软件,对来自五项独立研究的约1.62亿对双端和约1.83亿单端经过修剪和标准化的Illumina RNA测序读段(来自146个RNA测序文库的约60亿条初始读段)进行组装,构建了猪全外周血的从头转录组和基因组引导转录组。然后,我们从两个组装中去除了低置信度的推定转录本(PTs),并将剩余的PTs合并为一个由132,928个PTs组成的整合转录组,其中126,225个(约95%)PTs来自从头组装,且超过91%的PTs进行了剪接。在整合转录组中,约90%和63%的PTs分别与NCBI NT和NR数据库中的序列具有显著的序列相似性;68,754个(约52%)PTs被注释为15,965个独特的基因本体(GO)术语;7618个注释有酶委员会代码的PTs被分配到由京都基因与基因组百科全书(KEGG)策划的134条通路。17,528个PTs的完整外显子 - 内含子连接通过来自其他3种猪组织的PacBio IsoSeq全长cDNA读段、NCBI猪RefSeq mRNA以及Ensembl Sscrofa10.2注释的转录本进行了验证。37,569个PTs的5'末端完整性通过公开的基因表达帽分析(CAGE)数据进行了验证。与Ensembl转录本相比,我们发现:(1)54,402个PTs的推导前体与18,437个Ensembl转录本的前体至少共享一个内含子或外显子;(2)12,262个PTs的5'和3'末端都比其最大重叠的Ensembl转录本更长;(3)41,838个剪接PTs在Sscrofa10.2注释中完全缺失。当将PTs与猪NCBI RefSeq mRNA集合进行比较时,也得到了类似的结果。

结论

我们构建、验证并注释了一个全面的猪血液转录组,相较于Ensembl Sscrofa10.2注释和猪NCBI RefSeq mRNA有显著改进,为猪基于血液的高通量转录组分析以及推进猪基因组注释奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec3/5483264/fb68a0fdcb8b/12864_2017_3863_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验