Maciel Lucas F, Morales-Vicente David A, Silveira Gilbert O, Ribeiro Raphael O, Olberg Giovanna G O, Pires David S, Amaral Murilo S, Verjovski-Almeida Sergio
Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil.
Programa Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil.
Front Genet. 2019 Sep 12;10:823. doi: 10.3389/fgene.2019.00823. eCollection 2019.
Long non-coding RNAs (lncRNAs) (>200 nt) are expressed at levels lower than those of the protein-coding mRNAs, and in all eukaryotic model species where they have been characterized, they are transcribed from thousands of different genomic . In humans, some four dozen lncRNAs have been studied in detail, and they have been shown to play important roles in transcriptional regulation, acting in conjunction with transcription factors and epigenetic marks to modulate the tissue-type specific programs of transcriptional gene activation and repression. In , around 10,000 lncRNAs have been identified in previous works. However, the limited number of RNA-sequencing (RNA-seq) libraries that had been previously assessed, together with the use of old and incomplete versions of the genome and protein-coding transcriptome annotations, have hampered the identification of all lncRNAs expressed in the parasite. Here we have used 633 publicly available RNA-seq libraries from whole worms at different stages (n = 121), from isolated tissues (n = 24), from cell-populations (n = 81), and from single-cells (n = 407). We have assembled a set of 16,583 lncRNA transcripts originated from 10,024 genes, of which 11,022 are novel lncRNA transcripts, whereas the remaining 5,561 transcripts comprise 120 lncRNAs that are identical to and 5,441 lncRNAs that have gene overlap with lncRNAs already reported in previous works. Most importantly, our more stringent assembly and filtering pipeline has identified and removed a set of 4,293 lncRNA transcripts from previous publications that were in fact derived from partially processed mRNAs with intron retention. We have used weighted gene co-expression network analyses and identified 15 different gene co-expression modules. Each parasite life-cycle stage has at least one highly correlated gene co-expression module, and each module is comprised of hundreds to thousands lncRNAs and mRNAs having correlated co-expression patterns at different stages. Inspection of the top most highly connected genes within the modules' networks has shown that different lncRNAs are hub genes at different life-cycle stages, being among the most promising candidate lncRNAs to be further explored for functional characterization.
长链非编码RNA(lncRNAs)(>200个核苷酸)的表达水平低于蛋白质编码mRNA,并且在所有已对其进行特征描述的真核模式物种中,它们由数千个不同的基因组转录而来。在人类中,约有四十几种lncRNAs已得到详细研究,并且已证明它们在转录调控中发挥重要作用,与转录因子和表观遗传标记协同作用,以调节转录基因激活和抑制的组织类型特异性程序。在[具体物种未提及]中,先前的研究已鉴定出约10,000种lncRNAs。然而,先前评估的RNA测序(RNA-seq)文库数量有限,再加上使用的是旧的和不完整版本的基因组及蛋白质编码转录组注释,阻碍了对该寄生虫中所有表达的lncRNAs的鉴定。在此,我们使用了633个公开可用的RNA-seq文库,这些文库来自不同阶段的完整蠕虫(n = 121)、分离组织(n = 24)、细胞群体(n = 81)和单细胞(n = 407)。我们组装了一组由10,024个基因产生的16,583个lncRNA转录本,其中11,022个是新的lncRNA转录本,而其余5,561个转录本包括120个与先前研究中已报道的lncRNAs相同的lncRNAs以及5,441个与先前研究中已报道的lncRNAs存在基因重叠的lncRNAs。最重要的是,我们更严格的组装和过滤流程已从先前的出版物中鉴定并去除了一组4,293个lncRNA转录本,这些转录本实际上源自具有内含子保留的部分加工mRNA。我们使用加权基因共表达网络分析并鉴定出15个不同的基因共表达模块。每个寄生虫生命周期阶段至少有一个高度相关的基因共表达模块,并且每个模块由数百到数千个在不同阶段具有相关共表达模式的lncRNAs和mRNAs组成。对模块网络中连接性最高的顶级基因的检查表明,不同的lncRNAs在不同的生命周期阶段是枢纽基因,是最有希望进一步探索其功能特征的候选lncRNAs之一。