Suppr超能文献

通过整合保守性、二级结构以及高通量测序和芯片数据预测和鉴定秀丽隐杆线虫中的非编码 RNA。

Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data.

机构信息

Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.

出版信息

Genome Res. 2011 Feb;21(2):276-85. doi: 10.1101/gr.110189.110. Epub 2010 Dec 22.

Abstract

We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.

摘要

我们提出了一种整合机器学习方法 incRNA,用于全基因组鉴定非编码 RNA(ncRNA)。它结合了大量的表达数据、RNA 二级结构稳定性以及在蛋白质和核酸水平上的进化保守性。利用 incRNA 模型和 modENCODE 联盟的数据,我们能够以高精度(独立验证集上的 AUC 为 97%)将已知的秀丽隐杆线虫 ncRNA 与编码序列和其他基因组元件区分开来,并发现了 7000 多个新的 ncRNA 候选者,其中 1000 多个位于秀丽隐杆线虫基因组的基因间区域。基于验证集,我们估计大约 7000 个新的 ncRNA 候选者中有 91%是真正的阳性。然后,我们通过 RT-PCR 分析了 15 个新的 ncRNA 候选者,检测到了 14 个的表达情况。此外,我们还分析了所有新的 ncRNA 候选者的特性,发现它们在发育阶段具有不同的表达模式,并倾向于使用新的 RNA 结构家族。我们还发现它们经常被特定的转录因子靶向(约 59%的基因间新 ncRNA 候选者)。总的来说,我们的研究在秀丽隐杆线虫中鉴定了许多新的潜在 ncRNA,并提供了一种可适用于其他生物体的方法。

相似文献

引用本文的文献

2
Detecting gene expression in Caenorhabditis elegans.检测秀丽隐杆线虫中的基因表达。
Genetics. 2025 Jan 8;229(1):1-108. doi: 10.1093/genetics/iyae167.
5
The computational approaches of lncRNA identification based on coding potential: and challenges.基于编码潜能的lncRNA识别计算方法及挑战
Comput Struct Biotechnol J. 2020 Nov 19;18:3666-3677. doi: 10.1016/j.csbj.2020.11.030. eCollection 2020.

本文引用的文献

5
De novo prediction of structured RNAs from genomic sequences.从头预测基因组序列中的结构 RNA。
Trends Biotechnol. 2010 Jan;28(1):9-19. doi: 10.1016/j.tibtech.2009.09.006. Epub 2009 Nov 26.
6
TERRA: telomeric repeat-containing RNA.TERRA:含端粒重复序列的RNA。
EMBO J. 2009 Sep 2;28(17):2503-10. doi: 10.1038/emboj.2009.166. Epub 2009 Jul 23.
7
Unlocking the secrets of the genome.揭开基因组的秘密。
Nature. 2009 Jun 18;459(7249):927-30. doi: 10.1038/459927a.
9
Infernal 1.0: inference of RNA alignments.Infernal 1.0:RNA比对推断
Bioinformatics. 2009 May 15;25(10):1335-7. doi: 10.1093/bioinformatics/btp157. Epub 2009 Mar 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验