Zhong Cuncong, Andrews Justen, Zhang Shaojie
Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA.
Department of Biology, Indiana University, Bloomington, IN 47405, USA.
Int J Bioinform Res Appl. 2014;10(4-5):479-97. doi: 10.1504/IJBRA.2014.062996.
The Non-Coding RNA (ncRNA) elements in the 3' Untranslated Regions (3'-UTRs) are known to participate in the genes' post-transcriptional regulations. Inferring co-expression patterns of the genes through clustering these 3'-UTR ncRNA elements will provide invaluable insights for studying their biological functions. In this paper, we propose an improved RNA structural clustering pipeline. Benchmark of the new pipeline on Rfam data demonstrates over 10% performance improvements compared to the traditional hierarchical clustering pipeline. By applying the new clustering pipeline to 3'-UTRs of Drosophila melanogaster's genome, we have successfully identified 184 ncRNA clusters with 91.3% accuracy. One of these clusters corresponds to genes that are preferentially expressed in male Drosophila. Another cluster contains genes that are responsible for the functions of septate junction in epithelial cells. These discoveries encourage more studies on novel post-transcriptional regulation mechanisms.
已知3'非翻译区(3'-UTR)中的非编码RNA(ncRNA)元件参与基因的转录后调控。通过对这些3'-UTR ncRNA元件进行聚类来推断基因的共表达模式,将为研究其生物学功能提供宝贵的见解。在本文中,我们提出了一种改进的RNA结构聚类流程。新流程在Rfam数据上的基准测试表明,与传统的层次聚类流程相比,性能提高了10%以上。通过将新的聚类流程应用于黑腹果蝇基因组的3'-UTR,我们成功地识别出184个ncRNA簇,准确率为91.3%。其中一个簇对应于在雄性果蝇中优先表达的基因。另一个簇包含负责上皮细胞中分隔连接功能的基因。这些发现鼓励对新的转录后调控机制进行更多研究。