Suppr超能文献

CircCNNs,一种卷积神经网络框架,用于更好地理解外显子 circRNAs 的生物发生。

CircCNNs, a convolutional neural network framework to better understand the biogenesis of exonic circRNAs.

机构信息

Department of Biology, Miami University, Oxford, OH, 45056, USA.

出版信息

Sci Rep. 2024 Aug 16;14(1):18982. doi: 10.1038/s41598-024-69262-1.

Abstract

Circular RNAs (circRNAs) as biomarkers for cancer detection have been extensively explored, however, the biogenesis mechanism is still elusive. In contrast to linear splicing (LS) involved in linear transcript formation, the so-called back splicing (BS) process has been proposed to explain circRNA formation. To investigate the potential mechanism of BS via the machine learning approach, we curated a high-quality BS and LS exon pairs dataset with evidence-based stringent filtering. Two convolutional neural networks (CNN) base models with different structures for processing splicing junction sequences including motif extraction were created and compared after extensive hyperparameter tuning. In contrast to the previous study, we are able to identify motifs corresponding to well-established BS-associated genes such as MBNL1, QKI, and ESPR2. Importantly, despite prevalent high false positive rates in existing circRNA detection pipelines and databases, our base models demonstrated a notable high specificity (greater than 90%). To further improve the model performance, a novo fast numerical method was proposed and implemented to calculate the reverse complementary matches (RCMs) crossing two flanking regions and within each flanking region of exon pairs. Our CircCNNs framework that incorporated RCM information into the optimal base models further reduced the false positive rates leading to 88% prediction accuracy.

摘要

环状 RNA(circRNAs)作为癌症检测的生物标志物已经得到了广泛的研究,然而,其生物发生机制仍然难以捉摸。与涉及线性转录物形成的线性剪接(LS)相反,已经提出了所谓的反向剪接(BS)过程来解释 circRNA 的形成。为了通过机器学习方法研究 BS 的潜在机制,我们通过基于证据的严格筛选,整理了一个高质量的 BS 和 LS 外显子对数据集。创建了两个具有不同结构的卷积神经网络(CNN)基础模型,用于处理包括基序提取在内的剪接连接序列,并在广泛的超参数调整后进行了比较。与之前的研究相比,我们能够识别与 MBNL1、QKI 和 ESPR2 等公认的 BS 相关基因相对应的基序。重要的是,尽管现有的 circRNA 检测管道和数据库中普遍存在高假阳性率,但我们的基础模型表现出了显著的高特异性(大于 90%)。为了进一步提高模型性能,提出并实现了一种新的快速数值方法,用于计算跨越两个侧翼区域和每个外显子对侧翼区域内的反向互补匹配(RCM)。我们的 CircCNNs 框架将 RCM 信息纳入到最佳基础模型中,进一步降低了假阳性率,从而达到了 88%的预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69b2/11329666/3c9146c66f50/41598_2024_69262_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验