Chaabane Mohamed, Williams Robert M, Stephens Austin T, Park Juw Won
Department of Computer Engineering and Computer Science, Louisville, KY 40208, USA.
KBRIN Bioinformatics Core, University of Louisville, Louisville, KY 40208, USA.
Bioinformatics. 2020 Jan 1;36(1):73-80. doi: 10.1093/bioinformatics/btz537.
Over the past two decades, a circular form of RNA (circular RNA), produced through alternative splicing, has become the focus of scientific studies due to its major role as a microRNA (miRNA) activity modulator and its association with various diseases including cancer. Therefore, the detection of circular RNAs is vital to understanding their biogenesis and purpose. Prediction of circular RNA can be achieved in three steps: distinguishing non-coding RNAs from protein coding gene transcripts, separating short and long non-coding RNAs and predicting circular RNAs from other long non-coding RNAs (lncRNAs). However, the available tools are less than 80 percent accurate for distinguishing circular RNAs from other lncRNAs due to difficulty of classification. Therefore, the availability of a more accurate and fast machine learning method for the identification of circular RNAs, which considers the specific features of circular RNA, is essential to the development of systematic annotation.
Here we present an End-to-End deep learning framework, circDeep, to classify circular RNA from other lncRNA. circDeep fuses an RCM descriptor, ACNN-BLSTM sequence descriptor and a conservation descriptor into high level abstraction descriptors, where the shared representations across different modalities are integrated. The experiments show that circDeep is not only faster than existing tools but also performs at an unprecedented level of accuracy by achieving a 12 percent increase in accuracy over the other tools.
https://github.com/UofLBioinformatics/circDeep.
Supplementary data are available at Bioinformatics online.
在过去二十年中,一种通过可变剪接产生的环状RNA(circRNA)因其作为微小RNA(miRNA)活性调节剂的主要作用以及与包括癌症在内的各种疾病的关联,已成为科学研究的焦点。因此,环状RNA的检测对于理解其生物发生和功能至关重要。环状RNA的预测可通过三个步骤实现:从蛋白质编码基因转录本中区分非编码RNA,分离短和长非编码RNA,并从其他长非编码RNA(lncRNA)中预测环状RNA。然而,由于分类困难,现有的工具在从其他lncRNA中区分环状RNA时准确率不到80%。因此,开发一种更准确、快速且考虑环状RNA特定特征的机器学习方法来识别环状RNA,对于系统注释的发展至关重要。
在此,我们提出了一个端到端深度学习框架circDeep,用于从其他lncRNA中分类环状RNA。circDeep将RCM描述符、ACNN-BLSTM序列描述符和保守描述符融合为高级抽象描述符,其中整合了不同模态之间的共享表示。实验表明,circDeep不仅比现有工具更快,而且准确率达到了前所未有的水平,比其他工具提高了12%。
https://github.com/UofLBioinformatics/circDeep。
补充数据可在《生物信息学》在线获取。