LMFE：一种基于多特征融合与集成学习预测植物长链非编码RNA的新方法。

LMFE: A Novel Method for Predicting Plant LncRNA Based on Multi-Feature Fusion and Ensemble Learning.

作者信息

Zhang Hongwei, Shi Yan, Wang Yapeng, Yang Xu, Li Kefeng, Im Sio-Kei, Han Yu

机构信息

Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999074, China.

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China.

出版信息

Genes (Basel). 2025 Mar 31;16(4):424. doi: 10.3390/genes16040424.

DOI:10.3390/genes16040424

PMID:40282384

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12026654/

Abstract

: Long non-coding RNAs (lncRNAs) play a crucial regulatory role in plant trait expression and disease management, making their accurate prediction a key research focus for guiding biological experiments. While extensive studies have been conducted on animals and humans, plant lncRNA research remains relatively limited due to various challenges, such as data scarcity and genomic complexity. This study aims to bridge this gap by developing an effective computational method for predicting plant lncRNAs, specifically by classifying transcribed RNA sequences as lncRNAs or mRNAs using multi-feature analysis. : We propose the lncRNA multi-feature-fusion ensemble learning (LMFE) approach, a novel method that integrates 100-dimensional features from RNA biological properties-based, sequence-based, and structure-based features, employing the XGBoost ensemble learning algorithm for prediction. To address unbalanced datasets, we implemented the synthetic minority oversampling technique (SMOTE). LMFE was validated across benchmark datasets, cross-species datasets, unbalanced datasets, and independent datasets. : LMFE achieved an accuracy of 99.42%, an F1 of 0.99, and an MCC of 0.98 on the benchmark dataset, with robust cross-species performance (accuracy ranging from 89.30% to 99.81%). On unbalanced datasets, LMFE attained an average accuracy of 99.41%, representing a 12.29% improvement over traditional methods without SMOTE (average ACC of 87.12%). Compared to state-of-the-art methods, such as CPC2 and PLEKv2, LMFE consistently outperformed them across multiple metrics on independent datasets (with an accuracy ranging from 97.33% to 99.21%), with redundant features having minimal impact on performance. : LMFE provides a highly accurate and generalizable solution for plant lncRNA prediction, outperforming existing methods through multi-feature fusion and ensemble learning while demonstrating robustness to redundant features. Despite its effectiveness, variations in performance across species highlight the necessity for future improvements in managing diverse plant genomes. This method represents a valuable tool for advancing plant lncRNA research and guiding biological experiments.

摘要

长链非编码RNA（lncRNAs）在植物性状表达和疾病管理中发挥着关键的调控作用，因此其准确预测成为指导生物学实验的关键研究重点。尽管在动物和人类方面已经开展了大量研究，但由于数据稀缺和基因组复杂性等各种挑战，植物lncRNA研究仍然相对有限。本研究旨在通过开发一种有效的计算方法来预测植物lncRNAs，具体而言，是通过多特征分析将转录的RNA序列分类为lncRNAs或mRNAs。

我们提出了lncRNA多特征融合集成学习（LMFE）方法，这是一种新颖的方法，它整合了基于RNA生物学特性、基于序列和基于结构的100维特征，并采用XGBoost集成学习算法进行预测。为了解决数据集不平衡的问题，我们实施了合成少数过采样技术（SMOTE）。LMFE在基准数据集、跨物种数据集、不平衡数据集和独立数据集上进行了验证。

LMFE在基准数据集上的准确率达到99.42%，F1值为0.99，MCC为0.98，具有强大的跨物种性能（准确率范围为89.30%至99.81%）。在不平衡数据集上，LMFE的平均准确率达到99.41%，比未使用SMOTE的传统方法（平均ACC为87.12%）提高了12.29%。与CPC2和PLEKv2等现有方法相比，LMFE在独立数据集的多个指标上始终优于它们（准确率范围为97.33%至99.21%），冗余特征对性能的影响最小。

LMFE为植物lncRNA预测提供了一种高度准确且可推广的解决方案，通过多特征融合和集成学习优于现有方法，同时对冗余特征具有鲁棒性。尽管其有效性显著，但不同物种间的性能差异凸显了未来在处理多样植物基因组方面进行改进的必要性。该方法是推进植物lncRNA研究和指导生物学实验的宝贵工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f714/12026654/1254a89f9bdd/genes-16-00424-g001.jpg

相似文献

LMFE: A Novel Method for Predicting Plant LncRNA Based on Multi-Feature Fusion and Ensemble Learning.LMFE：一种基于多特征融合与集成学习预测植物长链非编码RNA的新方法。

Genes (Basel). 2025 Mar 31;16(4):424. doi: 10.3390/genes16040424.

An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations.一种结合多种特征表示策略的集成学习方法，用于预测长链非编码RNA的亚细胞定位。

Comput Biol Chem. 2025 Apr;115:108336. doi: 10.1016/j.compbiolchem.2024.108336. Epub 2025 Jan 1.

LncSL: A Novel Stacked Ensemble Computing Tool for Subcellular Localization of lncRNA by Amino Acid-Enhanced Features and Two-Stage Automated Selection Strategy.LncSL：一种通过氨基酸增强特征和两阶段自动选择策略进行长链非编码RNA亚细胞定位的新型堆叠集成计算工具。

Int J Mol Sci. 2024 Dec 23;25(24):13734. doi: 10.3390/ijms252413734.

Machine Learning-Based Annotation of Long Noncoding RNAs Using PLncPRO.基于机器学习的 PLncPRO 长非编码 RNA 注释

Methods Mol Biol. 2020;2107:253-260. doi: 10.1007/978-1-0716-0235-5_12.

Predicting Long non-coding RNAs through feature ensemble learning.通过特征集成学习预测长非编码 RNA。

BMC Genomics. 2020 Dec 17;21(Suppl 13):865. doi: 10.1186/s12864-020-07237-y.

PLEKv2: predicting lncRNAs and mRNAs based on intrinsic sequence features and the coding-net model.PLEKv2：基于内在序列特征和编码网络模型预测 lncRNAs 和 mRNAs。

BMC Genomics. 2024 Aug 2;25(1):756. doi: 10.1186/s12864-024-10662-y.

Prediction of plant lncRNA by ensemble machine learning classifiers.基于集成机器学习分类器的植物 lncRNA 预测。

BMC Genomics. 2018 May 2;19(1):316. doi: 10.1186/s12864-018-4665-2.

Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants.长非编码 RNA 的模式识别分析：植物预测的一种工具。

Brief Bioinform. 2019 Mar 25;20(2):682-689. doi: 10.1093/bib/bby034.

Plant miRNA-lncRNA Interaction Prediction with the Ensemble of CNN and IndRNN.基于 CNN 和 IndRNN 集成的植物 miRNA-lncRNA 相互作用预测

Interdiscip Sci. 2020 Mar;12(1):82-89. doi: 10.1007/s12539-019-00351-w. Epub 2019 Dec 6.

Ensemble Deep Learning Based on Multi-level Information Enhancement and Greedy Fuzzy Decision for Plant miRNA-lncRNA Interaction Prediction.基于多层次信息增强和贪心模糊决策的集成深度学习方法用于植物 miRNA-lncRNA 相互作用预测。

Interdiscip Sci. 2021 Dec;13(4):603-614. doi: 10.1007/s12539-021-00434-7. Epub 2021 Apr 26.

本文引用的文献

Unraveling the specialized metabolic pathways in medicinal plant genomes: a review.解析药用植物基因组中的特殊代谢途径：综述

Front Plant Sci. 2024 Dec 24;15:1459533. doi: 10.3389/fpls.2024.1459533. eCollection 2024.

Traditional Uses, Phytochemistry, Pharmacology and Toxicology of : A Critical Review and Future Perspectives.《[具体名称]的传统用途、植物化学、药理学与毒理学：批判性综述及未来展望》（注：原文中冒号前缺少具体所指内容，这里按常规翻译了一个大概格式，实际需根据具体所指补充完整）

Drug Des Devel Ther. 2024 Dec 30;18:6459-6485. doi: 10.2147/DDDT.S494417. eCollection 2024.

An automated phenotyping method for Chinese Cymbidium seedlings based on 3D point cloud.一种基于三维点云的中国兰花幼苗自动表型分析方法。

Plant Methods. 2024 Sep 30;20(1):151. doi: 10.1186/s13007-024-01277-1.

PLEKv2: predicting lncRNAs and mRNAs based on intrinsic sequence features and the coding-net model.PLEKv2：基于内在序列特征和编码网络模型预测 lncRNAs 和 mRNAs。

BMC Genomics. 2024 Aug 2;25(1):756. doi: 10.1186/s12864-024-10662-y.

BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification.BioDeepfuse：一种混合深度学习方法，结合了集成特征提取技术，用于增强非编码 RNA 分类。

RNA Biol. 2024 Jan;21(1):1-12. doi: 10.1080/15476286.2024.2329451. Epub 2024 Mar 25.

Identification of a lncRNA/circRNA-miRNA-mRNA network in Nasopharyngeal Carcinoma by deep sequencing and bioinformatics analysis.通过深度测序和生物信息学分析鉴定鼻咽癌中的lncRNA/circRNA-miRNA-mRNA网络

J Cancer. 2024 Feb 11;15(7):1916-1928. doi: 10.7150/jca.91546. eCollection 2024.

Long non-coding RNA (lncRNA) H19 in human cancer: From proliferation and metastasis to therapy.长非编码 RNA (lncRNA) H19 在人类癌症中的作用：从增殖和转移到治疗。

Pharmacol Res. 2022 Oct;184:106418. doi: 10.1016/j.phrs.2022.106418. Epub 2022 Aug 28.

Opportunities and Challenges of Predictive Approaches for the Non-coding RNA in Plants.植物非编码RNA预测方法的机遇与挑战

Front Plant Sci. 2022 Apr 14;13:890663. doi: 10.3389/fpls.2022.890663. eCollection 2022.

Prediction of presynaptic and postsynaptic neurotoxins based on feature extraction.基于特征提取的突触前和突触后神经毒素预测。

Math Biosci Eng. 2021 Jun 30;18(5):5943-5958. doi: 10.3934/mbe.2021297.

PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles.PlncRNA-HDeep：基于两种编码方式的混合深度学习进行植物长链非编码RNA预测

BMC Bioinformatics. 2021 May 12;22(Suppl 3):242. doi: 10.1186/s12859-020-03870-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

LMFE：一种基于多特征融合与集成学习预测植物长链非编码RNA的新方法。

LMFE: A Novel Method for Predicting Plant LncRNA Based on Multi-Feature Fusion and Ensemble Learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献