• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

启动子预测-MF(2L):一种基于多源特征融合和深度森林的新型启动子预测方法。

PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest.

机构信息

College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.

Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC, 3000, Australia.

出版信息

Interdiscip Sci. 2022 Sep;14(3):697-711. doi: 10.1007/s12539-022-00520-4. Epub 2022 Apr 30.

DOI:10.1007/s12539-022-00520-4
PMID:35488998
Abstract

Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.

摘要

启动子是短的 DNA 序列,在起始基因转录中起着至关重要的作用。然而,使用传统的实验技术以高通量的方式识别启动子仍然是一个挑战。为此,已经开发了几种基于机器学习模型的计算预测器,但它们的性能并不令人满意。在这项研究中,我们提出了一种新的两层预测器,称为 PredPromoter-MF(2L),它基于多源特征融合和集成学习。PredPromoter-MF(2L)是基于预训练的深度学习网络模型和序列衍生特征所学习到的各种深度特征开发的。基于 XGBoost 的特征选择用于减少融合特征的维度,并在所选特征子集上训练级联深度森林模型进行启动子预测。五重交叉验证和独立测试的结果均表明,PredPromoter-MF(2L)优于最先进的方法。

相似文献

1
PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest.启动子预测-MF(2L):一种基于多源特征融合和深度森林的新型启动子预测方法。
Interdiscip Sci. 2022 Sep;14(3):697-711. doi: 10.1007/s12539-022-00520-4. Epub 2022 Apr 30.
2
DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning.DPProm:一种基于深度学习的噬菌体基因组启动子及其类型的双层预测器。
IEEE J Biomed Health Inform. 2022 Oct;26(10):5258-5266. doi: 10.1109/JBHI.2022.3193224. Epub 2022 Oct 4.
3
An Extensive Examination of Discovering 5-Methylcytosine Sites in Genome-Wide DNA Promoters Using Machine Learning Based Approaches.基于机器学习的方法在全基因组 DNA 启动子中发现 5-甲基胞嘧啶位点的广泛研究。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):87-94. doi: 10.1109/TCBB.2021.3082184. Epub 2022 Feb 3.
4
BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection.BERT-启动子:一种使用BERT预训练模型和SHAP特征选择的基于序列的DNA启动子改进预测器。
Comput Biol Chem. 2022 Aug;99:107732. doi: 10.1016/j.compbiolchem.2022.107732. Epub 2022 Jul 14.
5
A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach.基于粒子群优化算法特征选择和集成学习方法的 RNA 假尿嘧啶位点特征融合预测器。
Curr Issues Mol Biol. 2021 Nov 1;43(3):1844-1858. doi: 10.3390/cimb43030129.
6
iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition.iPSW(2L)-PseKNC:一种双层预测器,通过伪 K- 元核苷酸组成的混合特征来识别启动子及其强度。
Genomics. 2019 Dec;111(6):1785-1793. doi: 10.1016/j.ygeno.2018.12.001. Epub 2018 Dec 5.
7
Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework.利用堆叠集成学习框架对大肠杆菌中的一般和特定类型启动子进行计算预测和解释。
Brief Bioinform. 2021 Mar 22;22(2):2126-2140. doi: 10.1093/bib/bbaa049.
8
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
9
TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.TargetCrys:通过融合多视图特征与双层支持向量机进行蛋白质结晶预测。
Amino Acids. 2016 Nov;48(11):2533-2547. doi: 10.1007/s00726-016-2274-4. Epub 2016 Jun 14.
10
PromGER: Promoter Prediction Based on Graph Embedding and Ensemble Learning for Eukaryotic Sequence.基于图嵌入和集成学习的真核序列启动子预测
Genes (Basel). 2023 Jul 13;14(7):1441. doi: 10.3390/genes14071441.

引用本文的文献

1
iProL: identifying DNA promoters from sequence information based on Longformer pre-trained model.iProL:基于 Longformer 预训练模型从序列信息中识别 DNA 启动子。
BMC Bioinformatics. 2024 Jun 25;25(1):224. doi: 10.1186/s12859-024-05849-9.
2
Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns.通过融合物理化学性质和核苷酸分布模式的序列衍生特征来鉴定长链染色体外环状DNA
Sci Rep. 2024 Apr 24;14(1):9466. doi: 10.1038/s41598-024-57457-5.

本文引用的文献

1
Mechanisms of distinctive mismatch tolerance between Rad51 and Dmc1 in homologous recombination.Rad51 和 Dmc1 在同源重组中独特的错配容忍机制。
Nucleic Acids Res. 2021 Dec 16;49(22):13135-13149. doi: 10.1093/nar/gkab1141.
2
Rapid and sensitive recombinase polymerase amplification combined with lateral flow strips for detecting Candida albicans.用于检测白色念珠菌的快速灵敏重组酶聚合酶扩增技术与侧流试纸条联用
Anal Biochem. 2021 Nov 15;633:114428. doi: 10.1016/j.ab.2021.114428. Epub 2021 Oct 19.
3
Resolving complex structures at oncovirus integration loci with conjugate graph.
利用共轭图解析致癌病毒整合位点的复杂结构
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab359.
4
Searchlight: automated bulk RNA-seq exploration and visualisation using dynamically generated R scripts.Searchlight:使用动态生成的 R 脚本进行自动化批量 RNA-seq 探索和可视化。
BMC Bioinformatics. 2021 Aug 19;22(1):411. doi: 10.1186/s12859-021-04321-2.
5
IP4M: an integrated platform for mass spectrometry-based metabolomics data mining.IP4M:基于质谱的代谢组学数据挖掘的集成平台。
BMC Bioinformatics. 2020 Oct 7;21(1):444. doi: 10.1186/s12859-020-03786-x.
6
Deep learning based prediction of reversible HAT/HDAC-specific lysine acetylation.基于深度学习的可逆转 HAT/HDAC 特异性赖氨酸乙酰化预测。
Brief Bioinform. 2020 Sep 25;21(5):1798-1805. doi: 10.1093/bib/bbz107.
7
CrossICC: iterative consensus clustering of cross-platform gene expression data without adjusting batch effect.CrossICC:无需调整批次效应的跨平台基因表达数据的迭代一致性聚类。
Brief Bioinform. 2020 Sep 25;21(5):1818-1824. doi: 10.1093/bib/bbz116.
8
iPromoter-BnCNN: a novel branched CNN-based predictor for identifying and classifying sigma promoters.iPromoter-BnCNN:一种基于分支卷积神经网络的新型预测器,用于识别和分类σ启动子。
Bioinformatics. 2020 Dec 8;36(19):4869-4875. doi: 10.1093/bioinformatics/btaa609.
9
MSC-Secreted Exosomal H19 Promotes Trophoblast Cell Invasion and Migration by Downregulating let-7b and Upregulating FOXO1.间充质干细胞分泌的外泌体H19通过下调let-7b和上调FOXO1促进滋养层细胞的侵袭和迁移。
Mol Ther Nucleic Acids. 2020 Mar 6;19:1237-1249. doi: 10.1016/j.omtn.2019.11.031. Epub 2019 Dec 6.
10
Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity.基因型填充与参考面板:关于单倍型大小和多样性的系统评估
Brief Bioinform. 2019 Nov 6. doi: 10.1093/bib/bbz108.