Suppr超能文献

QRNAstruct:一种通过与生物活性的回归来提取 RNA 二级结构特征的方法。

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity.

机构信息

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan.

出版信息

Nucleic Acids Res. 2022 Jul 22;50(13):e73. doi: 10.1093/nar/gkac220.

Abstract

Recent technological advances have enabled the generation of large amounts of data consisting of RNA sequences and their functional activity. Here, we propose a method for extracting secondary structure features that affect the functional activity of RNA from sequence-activity data. Given pairs of RNA sequences and their corresponding bioactivity values, our method calculates position-specific structural features of the input RNA sequences, considering every possible secondary structure of each RNA. A Ridge regression model is trained using the structural features as feature vectors and the bioactivity values as response variables. Optimized model parameters indicate how secondary structure features affect bioactivity. We used our method to extract intramolecular structural features of bacterial translation initiation sites and self-cleaving ribozymes, and the intermolecular features between rRNAs and Shine-Dalgarno sequences and between U1 RNAs and splicing sites. We not only identified known structural features but also revealed more detailed insights into structure-activity relationships than previously reported. Importantly, the datasets we analyzed here were obtained from different experimental systems and differed in size, sequence length and similarity, and number of RNA molecules involved, demonstrating that our method is applicable to various types of data consisting of RNA sequences and bioactivity values.

摘要

最近的技术进步使得能够生成大量包含 RNA 序列及其功能活性的数据集。在这里,我们提出了一种从序列-活性数据中提取影响 RNA 功能活性的二级结构特征的方法。给定 RNA 序列对及其相应的生物活性值,我们的方法计算输入 RNA 序列的位置特异性结构特征,考虑每个 RNA 的每个可能的二级结构。使用结构特征作为特征向量,生物活性值作为响应变量来训练岭回归模型。优化后的模型参数表明二级结构特征如何影响生物活性。我们使用该方法提取细菌翻译起始位点和自我切割核酶的分子内结构特征,以及 rRNA 和 Shine-Dalgarno 序列之间以及 U1 RNA 和剪接位点之间的分子间特征。我们不仅鉴定了已知的结构特征,而且比以前的报道更深入地揭示了结构-活性关系。重要的是,我们分析的数据集来自不同的实验系统,在大小、序列长度和相似性以及涉及的 RNA 分子数量方面存在差异,这表明我们的方法适用于包含 RNA 序列和生物活性值的各种类型的数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39a1/9303433/e82c830ac606/gkac220fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验