Pacholewska Alicja, Drögemüller Michaela, Klukowska-Rötzler Jolanta, Lanz Simone, Hamza Eman, Dermitzakis Emmanouil T, Marti Eliane, Gerber Vincent, Leeb Tosso, Jagannathan Vidhya
Swiss Institute of Equine Medicine, Vetsuisse Faculty, University of Bern and Agroscope, Bern, Switzerland; Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland.
Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland.
PLoS One. 2015 Mar 19;10(3):e0122011. doi: 10.1371/journal.pone.0122011. eCollection 2015.
Complete transcriptomic data at high resolution are available only for a few model organisms with medical importance. The gene structures of non-model organisms are mostly computationally predicted based on comparative genomics with other species. As a result, more than half of the horse gene models are known only by projection. Experimental data supporting these gene models are scarce. Moreover, most of the annotated equine genes are single-transcript genes. Utilizing RNA sequencing (RNA-seq) the experimental validation of predicted transcriptomes has become accessible at reasonable costs. To improve the horse genome annotation we performed RNA-seq on 561 samples of peripheral blood mononuclear cells (PBMCs) derived from 85 Warmblood horses. The mapped sequencing reads were used to build a new transcriptome assembly. The new assembly revealed many alternative isoforms associated to known genes or to those predicted by the Ensembl and/or Gnomon pipelines. We also identified 7,531 transcripts not associated with any horse gene annotated in public databases. Of these, 3,280 transcripts did not have a homologous match to any sequence deposited in the NCBI EST database suggesting horse specificity. The unknown transcripts were categorized as coding and noncoding based on predicted coding potential scores. Among them 230 transcripts had high coding potential score, at least 2 exons, and an open reading frame of at least 300 nt. We experimentally validated 9 new equine coding transcripts using RT-PCR and Sanger sequencing. Our results provide valuable detailed information on many transcripts yet to be annotated in the horse genome.
只有少数具有医学重要性的模式生物拥有高分辨率的完整转录组数据。非模式生物的基因结构大多是基于与其他物种的比较基因组学通过计算预测的。因此,超过一半的马基因模型仅是通过推测得知。支持这些基因模型的实验数据稀缺。此外,大多数注释的马基因是单转录本基因。利用RNA测序(RNA-seq),以合理的成本对预测的转录组进行实验验证已成为可能。为了改进马基因组注释,我们对来自85匹温血马的561份外周血单核细胞(PBMC)样本进行了RNA-seq。映射的测序读数用于构建新的转录组组装。新的组装揭示了许多与已知基因或由Ensembl和/或Gnomon管道预测的基因相关的可变异构体。我们还鉴定出7531个转录本,它们与公共数据库中注释的任何马基因均无关联。其中,3280个转录本与NCBI EST数据库中存入的任何序列均无同源匹配,表明具有马的特异性。根据预测的编码潜力得分,将未知转录本分类为编码和非编码。其中230个转录本具有高编码潜力得分,至少有2个外显子,并且开放阅读框至少为300 nt。我们使用RT-PCR和Sanger测序对9个新的马编码转录本进行了实验验证。我们的结果为马基因组中许多有待注释的转录本提供了有价值的详细信息。