Suppr超能文献

sORFPred:一种基于综合特征和集成学习的预测植物长链非编码RNA中短开放阅读框的方法。

sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs.

作者信息

Chen Ziwei, Meng Jun, Zhao Siyuan, Yin Chao, Luan Yushi

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.

School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China.

出版信息

Interdiscip Sci. 2023 Jun;15(2):189-201. doi: 10.1007/s12539-023-00552-4. Epub 2023 Jan 27.

Abstract

Long non-coding RNAs (lncRNAs) are important regulators of biological processes. It has recently been shown that some lncRNAs include small open reading frames (sORFs) that can encode small peptides of no more than 100 amino acids. However, existing methods are commonly applied to human and animal datasets and still suffer from low feature representation capability. Thus, accurate and credible prediction of sORFs with coding ability in plant lncRNAs is imperative. This paper proposes a new method termed sORFPred, in which we design a model named MCSEN by combining multi-scale convolution and Squeeze-and-Excitation Networks to fully mine distinct information embedded in sORFs, integrate and optimize multiple sequence-based and physicochemical feature descriptors, and built a two-layer prediction classifier based on Bayesian optimization algorithm and Extra Trees. sORFPred has been evaluated on sORFs datasets of three species and experimentally validated sORFs dataset. Results indicate that sORFPred outperforms existing methods and achieves 97.28% accuracy, 97.06% precision, 97.52% recall, and 97.29% F1-score on Arabidopsis thaliana, which shows a significant improvement in prediction performance compared to various conventional shallow machine learning and deep learning models.

摘要

长链非编码RNA(lncRNAs)是生物过程的重要调节因子。最近有研究表明,一些lncRNAs包含小开放阅读框(sORFs),这些小开放阅读框可以编码不超过100个氨基酸的小肽。然而,现有方法通常应用于人类和动物数据集,并且仍然存在特征表示能力低的问题。因此,准确可靠地预测植物lncRNAs中具有编码能力的sORFs势在必行。本文提出了一种名为sORFPred的新方法,我们通过结合多尺度卷积和挤压激励网络设计了一个名为MCSEN的模型,以充分挖掘sORFs中嵌入的不同信息,整合和优化多个基于序列和理化性质的特征描述符,并基于贝叶斯优化算法和极端随机树构建了一个两层预测分类器。sORFPred已在三个物种的sORFs数据集上进行了评估,并通过实验验证了sORFs数据集。结果表明,sORFPred优于现有方法,在拟南芥上的准确率达到97.28%,精确率达到97.06%,召回率达到97.52%,F1分数达到97.29%,与各种传统的浅层机器学习和深度学习模型相比,预测性能有了显著提高。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验