Suppr超能文献

启动子预测-MF(2L):一种基于多源特征融合和深度森林的新型启动子预测方法。

PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest.

机构信息

College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.

Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC, 3000, Australia.

出版信息

Interdiscip Sci. 2022 Sep;14(3):697-711. doi: 10.1007/s12539-022-00520-4. Epub 2022 Apr 30.

Abstract

Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.

摘要

启动子是短的 DNA 序列,在起始基因转录中起着至关重要的作用。然而,使用传统的实验技术以高通量的方式识别启动子仍然是一个挑战。为此,已经开发了几种基于机器学习模型的计算预测器,但它们的性能并不令人满意。在这项研究中,我们提出了一种新的两层预测器,称为 PredPromoter-MF(2L),它基于多源特征融合和集成学习。PredPromoter-MF(2L)是基于预训练的深度学习网络模型和序列衍生特征所学习到的各种深度特征开发的。基于 XGBoost 的特征选择用于减少融合特征的维度,并在所选特征子集上训练级联深度森林模型进行启动子预测。五重交叉验证和独立测试的结果均表明,PredPromoter-MF(2L)优于最先进的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验