School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, South Australia, Australia.
PLoS One. 2012;7(8):e42638. doi: 10.1371/journal.pone.0042638. Epub 2012 Aug 6.
It is apparent that non-coding transcripts are a common feature of higher organisms and encode uncharacterized layers of genetic regulation and information. We used public bovine EST data from many developmental stages and tissues, and developed a pipeline for the genome wide identification and annotation of non-coding RNAs (ncRNAs). We have predicted 23,060 bovine ncRNAs, 99% of which are un-annotated, based on known ncRNA databases. Intergenic transcripts accounted for the majority (57%) of the predicted ncRNAs and the occurrence of ncRNAs and genes were only moderately correlated (r = 0.55, p-value<2.2e-16). Many of these intergenic non-coding RNAs mapped close to the 3' or 5' end of thousands of genes and many of these were transcribed from the opposite strand with respect to the closest gene, particularly regulatory-related genes. Conservation analyses showed that these ncRNAs were evolutionarily conserved, and many intergenic ncRNAs proximate to genes contained sequence-specific motifs. Correlation analysis of expression between these intergenic ncRNAs and protein-coding genes using RNA-seq data from a variety of tissues showed significant correlations with many transcripts. These results support the hypothesis that ncRNAs are common, transcribed in a regulated fashion and have regulatory functions.
很明显,非编码转录本是高等生物的一个共同特征,它们编码着尚未被描述的遗传调控和信息层。我们利用来自多个发育阶段和组织的公共牛 EST 数据,开发了一种用于全基因组鉴定和注释非编码 RNA(ncRNA)的管道。我们根据已知的 ncRNA 数据库预测了 23060 个牛 ncRNA,其中 99%是未注释的。基因间转录本占预测的 ncRNA 的大多数(57%),ncRNA 的出现与基因的相关性仅中等(r=0.55,p 值<2.2e-16)。这些基因间非编码 RNA 中的许多靠近数千个基因的 3' 或 5' 端,并且许多从与最近基因相反的链上转录,特别是与调节相关的基因。保守性分析表明这些 ncRNA 是进化保守的,许多位于基因附近的基因间 ncRNA 含有序列特异性基序。使用来自多种组织的 RNA-seq 数据对这些基因间 ncRNA 与蛋白质编码基因之间的表达进行相关性分析显示,与许多转录物存在显著相关性。这些结果支持了 ncRNA 是普遍存在的、受调控转录的,并具有调节功能的假说。