Department of Biology, Miami University, Oxford, OH, 45056, USA.
Sci Rep. 2024 Aug 16;14(1):18982. doi: 10.1038/s41598-024-69262-1.
Circular RNAs (circRNAs) as biomarkers for cancer detection have been extensively explored, however, the biogenesis mechanism is still elusive. In contrast to linear splicing (LS) involved in linear transcript formation, the so-called back splicing (BS) process has been proposed to explain circRNA formation. To investigate the potential mechanism of BS via the machine learning approach, we curated a high-quality BS and LS exon pairs dataset with evidence-based stringent filtering. Two convolutional neural networks (CNN) base models with different structures for processing splicing junction sequences including motif extraction were created and compared after extensive hyperparameter tuning. In contrast to the previous study, we are able to identify motifs corresponding to well-established BS-associated genes such as MBNL1, QKI, and ESPR2. Importantly, despite prevalent high false positive rates in existing circRNA detection pipelines and databases, our base models demonstrated a notable high specificity (greater than 90%). To further improve the model performance, a novo fast numerical method was proposed and implemented to calculate the reverse complementary matches (RCMs) crossing two flanking regions and within each flanking region of exon pairs. Our CircCNNs framework that incorporated RCM information into the optimal base models further reduced the false positive rates leading to 88% prediction accuracy.
环状 RNA(circRNAs)作为癌症检测的生物标志物已经得到了广泛的研究,然而,其生物发生机制仍然难以捉摸。与涉及线性转录物形成的线性剪接(LS)相反,已经提出了所谓的反向剪接(BS)过程来解释 circRNA 的形成。为了通过机器学习方法研究 BS 的潜在机制,我们通过基于证据的严格筛选,整理了一个高质量的 BS 和 LS 外显子对数据集。创建了两个具有不同结构的卷积神经网络(CNN)基础模型,用于处理包括基序提取在内的剪接连接序列,并在广泛的超参数调整后进行了比较。与之前的研究相比,我们能够识别与 MBNL1、QKI 和 ESPR2 等公认的 BS 相关基因相对应的基序。重要的是,尽管现有的 circRNA 检测管道和数据库中普遍存在高假阳性率,但我们的基础模型表现出了显著的高特异性(大于 90%)。为了进一步提高模型性能,提出并实现了一种新的快速数值方法,用于计算跨越两个侧翼区域和每个外显子对侧翼区域内的反向互补匹配(RCM)。我们的 CircCNNs 框架将 RCM 信息纳入到最佳基础模型中,进一步降低了假阳性率,从而达到了 88%的预测准确性。