Suppr超能文献

通过mRNA测序确定的马蛋白质编码基因的结构注释。

Structural annotation of equine protein-coding genes determined by mRNA sequencing.

作者信息

Coleman S J, Zeng Z, Wang K, Luo S, Khrebtukova I, Mienaltowski M J, Schroth G P, Liu J, MacLeod J N

机构信息

Department of Veterinary Science, Maxwell H. Gluck Equine Research Center, University of Kentucky, Lexington, KY 40546, USA.

出版信息

Anim Genet. 2010 Dec;41 Suppl 2:121-30. doi: 10.1111/j.1365-2052.2010.02118.x.

Abstract

The horse, like the majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for the majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758105 sequence tags of 35 bases each, equalling 10.28 gbp of total sequence data. The tag alignments represent approximately 207 × coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to refine the structural annotation for 11,356 of these predicted genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq-derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples.

摘要

与大多数动物物种一样,马在公共数据库中可获得的物种特异性表达序列数据量有限。因此,马基因组中定义的大多数基因的结构模型是基于从头序列分析或其他哺乳动物物种基因结构的推断。本研究使用基于Illumina的信使核糖核酸测序(RNA测序)来帮助完善马蛋白质编码基因的结构注释,并初步评估基因表达模式。对来自八个马组织的信使核糖核酸进行测序,产生了293,758,105个长度均为35个碱基的序列标签,总计10.28千兆碱基对的序列数据。这些标签比对代表了马信使核糖核酸转录组约207倍的覆盖度,并证实了由Ensembl和美国国立生物技术信息中心预测的约90%蛋白质编码基因结构的转录活性。标签覆盖度足以完善其中11,356个预测基因的结构注释,同时还鉴定出另外456个具有外显子/内含子特征的转录本,这些转录本未被Ensembl或美国国立生物技术信息中心列出。将Ensembl和美国国立生物技术信息中心注释流程预测的蛋白质编码基因的基因组位点数据和区间与75,116个源自RNA测序的转录单元相结合,生成了一个由20,302个定义位点组成的马蛋白质编码基因共有集。基因本体注释用于比较以组织限制性模式表达或在所有组织样本中广泛表达的基因的功能和结构类别。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验