Suppr超能文献

NeRNA:一种用于非编码 RNA 机器学习应用的负向数据生成框架。

NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs.

机构信息

Department of Bioengineering, Graduate School of Engineering and Science, Abdullah Gül University, Kayseri, Turkey.

Department of Engineering Science, Faculty of Engineering, Abdullah Gül University, Kayseri, Turkey.

出版信息

Comput Biol Med. 2023 Jun;159:106861. doi: 10.1016/j.compbiomed.2023.106861. Epub 2023 Apr 11.

Abstract

Many supervised machine learning based noncoding RNA (ncRNA) analysis methods have been developed to classify and identify novel sequences. During such analysis, the positive learning datasets usually consist of known examples of ncRNAs and some of them might even have weak or strong experimental validation. On the contrary, there are neither databases listing the confirmed negative sequences for a specific ncRNA class nor standardized methodologies developed to generate high quality negative examples. To overcome this challenge, a novel negative data generation method, NeRNA (negative RNA), is developed in this work. NeRNA uses known examples of given ncRNA sequences and their calculated structures for octal representation to create negative sequences in a manner similar to frameshift mutations but without deletion or insertion. NeRNA is tested individually with four different ncRNA datasets including microRNA (miRNA), transfer RNA (tRNA), long noncoding RNA (lncRNA), and circular RNA (circRNA). Furthermore, a species-specific case analysis is performed to demonstrate and compare the performance of NeRNA for miRNA prediction. The results of 1000 fold cross-validation on Decision Tree, Naïve Bayes and Random Forest classifiers, and deep learning algorithms such as Multilayer Perceptron, Convolutional Neural Network, and Simple feedforward Neural Networks indicate that models obtained by using NeRNA generated datasets, achieves substantially high prediction performance. NeRNA is released as an easy-to-use, updatable and modifiable KNIME workflow that can be downloaded with example datasets and required extensions. In particular, NeRNA is designed to be a powerful tool for RNA sequence data analysis.

摘要

许多基于监督机器学习的非编码 RNA(ncRNA)分析方法已经被开发出来用于对 ncRNA 序列进行分类和识别。在这种分析中,阳性学习数据集通常由已知的 ncRNA 示例组成,其中一些甚至可能具有较弱或较强的实验验证。相反,既没有列出特定 ncRNA 类别的已确认阴性序列的数据库,也没有开发出用于生成高质量阴性示例的标准化方法。为了克服这一挑战,本研究开发了一种新的阴性数据生成方法 NeRNA(negative RNA)。NeRNA 使用已知的给定 ncRNA 序列及其计算结构的八进制表示来创建阴性序列,方式类似于移码突变,但没有删除或插入。NeRNA 分别针对四个不同的 ncRNA 数据集(包括 microRNA(miRNA)、转移 RNA(tRNA)、长非编码 RNA(lncRNA)和环状 RNA(circRNA))进行了测试。此外,还进行了特定物种的案例分析,以展示和比较 NeRNA 用于 miRNA 预测的性能。决策树、朴素贝叶斯和随机森林分类器以及深度学习算法(如多层感知机、卷积神经网络和简单前馈神经网络)的 1000 次交叉验证的结果表明,使用 NeRNA 生成的数据集获得的模型具有很高的预测性能。NeRNA 作为一个易于使用、可更新和可修改的 KNIME 工作流发布,可以下载带有示例数据集和所需扩展的版本。特别是,NeRNA 被设计为一种强大的 RNA 序列数据分析工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验