Wang Yongcui, Chen Shilong, Li Wenran, Jiang Rui, Wang Yong
Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai 810008, China.
Qinghai Provincial Key Laboratory of Crop Molecular Breeding, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810008, China.
NAR Genom Bioinform. 2020 Mar 20;2(2):lqaa019. doi: 10.1093/nargab/lqaa019. eCollection 2020 Jun.
Recent RNA knockdown experiments revealed that a dozen divergent long noncoding RNAs (lncRNAs) positively regulate the transcription of genes in cis. Here, to understand the regulatory mechanism of divergent lncRNAs, we proposed a computational model IRDL (dentify the egulatory ivergent ncRNAs) to associate divergent lncRNAs with target genes. IRDL took advantage of the cross-tissue paired expression and chromatin accessibility data in ENCODE and a dozen experimentally validated divergent lncRNA target genes. IRDL integrated sequence similarity, co-expression and co-accessibility features, battled the scarcity of gold standard datasets with an increasingly learning framework and identified 446 and 977 divergent lncRNA-gene regulatory associations for mouse and human, respectively. We found that the identified divergent lncRNAs and target genes correlated well in expression and chromatin accessibility. The functional and pathway enrichment analysis suggests that divergent lncRNAs are strongly associated with developmental regulatory transcription factors. The predicted loop structure validation and canonical database search indicate a scaffold regulatory model for divergent lncRNAs. Furthermore, we computationally revealed the tissue/cell-specific regulatory associations considering the specificity of lncRNA. In conclusion, IRDL provides a way to understand the regulatory mechanism of divergent lncRNAs and hints at hundreds of tissue/cell-specific regulatory associations worthy for further biological validation.
最近的RNA敲低实验表明,有十几种不同的长链非编码RNA(lncRNA)在顺式作用中正向调控基因转录。在此,为了解不同lncRNA的调控机制,我们提出了一种计算模型IRDL(识别调控性不同ncRNA),以将不同的lncRNA与靶基因相关联。IRDL利用了ENCODE中的跨组织配对表达和染色质可及性数据以及十几个经实验验证的不同lncRNA靶基因。IRDL整合了序列相似性、共表达和共可及性特征,通过一个日益完善的学习框架应对金标准数据集的稀缺性,并分别为小鼠和人类鉴定出446个和977个不同的lncRNA-基因调控关联。我们发现,鉴定出的不同lncRNA和靶基因在表达和染色质可及性方面具有良好的相关性。功能和通路富集分析表明,不同的lncRNA与发育调控转录因子密切相关。预测的环结构验证和规范数据库搜索表明了不同lncRNA的支架调控模型。此外,考虑到lncRNA的特异性,我们通过计算揭示了组织/细胞特异性调控关联。总之,IRDL提供了一种了解不同lncRNA调控机制的方法,并暗示了数百种值得进一步生物学验证的组织/细胞特异性调控关联。