Suppr超能文献

在猪的EST数据及相关哺乳动物中检测RNA结构

Detection of RNA structures in porcine EST data and related mammals.

作者信息

Seemann Stefan E, Gilchrist Michael J, Hofacker Ivo L, Stadler Peter F, Gorodkin Jan

机构信息

Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark.

出版信息

BMC Genomics. 2007 Sep 10;8:316. doi: 10.1186/1471-2164-8-316.

Abstract

BACKGROUND

Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource http://pigest.ku.dk which also contains expression information distributed on 97 non-normalized cDNA libraries.

RESULTS

We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance.

CONCLUSION

Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs.

摘要

背景

非编码RNA(ncRNA)参与广泛的调控功能。近年来,关于真核生物中观察到的多聚腺苷酸化ncRNA和类mRNA的ncRNA的报道日益增多。为了进一步研究这一现象,我们检查了中丹猪EST资源(http://pigest.ku.dk)中的大数据集,该数据集还包含分布在97个非标准化cDNA文库中的表达信息。

结果

我们构建了一个名为EST2ncRNA的流程来搜索已知和新型ncRNA。该流程利用与ncRNA数据库的序列相似性(blast)、与Rfam的结构相似性(RaveNnA)以及多序列比对来预测保守的新型假定RNA结构(RNAz)。将来自猪EST资源的48,000个重叠群和73,000个单序列读取输入到EST2ncRNA中。使用该流程,我们在137个重叠群和单序列读取(重叠读取)中鉴定出已知的RNA结构,并在另外1,262个重叠读取的非蛋白质编码区域中预测出高可信度的RNA结构。其中,270个重叠读取中的结构与人类现有的预测重叠。总之,猪EST资源在715个重叠群和340个单序列读取中包含反式作用元件(ncRNA),在311个重叠群和51个单序列读取中包含顺式作用元件(UTR内),其中18个重叠读取同时包含反式和顺式作用元件的预测。将预测的RNAz候选物与猪EST表达信息进行比较,我们鉴定出114个在至少十个非标准化cDNA文库中有RNAz预测和表达的重叠群。我们得出结论,具有RNAz和已知预测的重叠群通常比蛋白质编码转录本表达水平低得多。此外,我们还观察到我们的ncRNA候选物约占cDNA文库中表达基因的1%至2%。有趣的是,来自发育(脑)组织的cDNA文库中ncRNA候选物含量最高,约为2%。这些观察结果与关于ncRNA在高等生物中的作用的现有知识和假设相关。此外,在已鉴定的18,600个猪编码转录本中,约80%以及不到三分之一的无开放阅读框转录本至少在密切相关的牛基因组中保守。在猪EST数据和牛基因组之间,约1%的编码匹配和10%的其余匹配是独特的。基于猪与牛的比对,我们通过UCSC可用比对搜索与其他16种生物的相似性,例如人类基因组的覆盖率为87%。

结论

除了找回一些已注释的功能性RNA结构外,我们还在多聚腺苷酸化猪转录本中预测了大量高可信度的保守二级结构。我们对预测的ncRNA候选物相对较低表达水平的观察以及发育阶段cDNA文库中相对含量较高的观察结果与ncRNA在高等生物中的当前作用范式一致,并支持多聚腺苷酸化ncRNA的观点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c87e/2072958/94278e108d92/1471-2164-8-316-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验