• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iProEP:一种用于预测启动子的计算预测工具。

iProEP: A Computational Predictor for Predicting Promoter.

作者信息

Lai Hong-Yan, Zhang Zhao-Yue, Su Zhen-Dong, Su Wei, Ding Hui, Chen Wei, Lin Hao

机构信息

Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.

Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China; Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China.

出版信息

Mol Ther Nucleic Acids. 2019 Sep 6;17:337-346. doi: 10.1016/j.omtn.2019.05.028. Epub 2019 Jun 13.

DOI:10.1016/j.omtn.2019.05.028
PMID:31299595
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6616480/
Abstract

Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/).

摘要

启动子是位于转录起始位点(TSS)周围的一种基本DNA元件,能够调控基因转录。启动子识别在确定转录单元、研究基因结构、分析基因调控机制以及注释基因功能信息方面具有重要意义。已经提出了许多模型来预测启动子。然而,这些方法的性能仍有待提高。在这项工作中,我们将伪k元核苷酸组成(PseKNC)与位置相关评分函数(PCSF)相结合,构建了人类(智人)、黑腹果蝇、秀丽隐杆线虫、枯草芽孢杆菌和大肠杆菌的启动子序列。然后采用最小冗余最大相关(mRMR)算法和增量特征选择策略来找出最优特征子集。使用支持向量机(SVM)区分启动子和非启动子。在10折交叉验证测试中,智人、黑腹果蝇、秀丽隐杆线虫、枯草芽孢杆菌和大肠杆菌的准确率分别为93.3%、93.9%、95.7%、95.2%和93.1%,相应的受试者工作特征曲线下面积(AUC)分别为0.974、0.975、0.981、0.988和0.976。比较结果表明,我们的方法在识别启动子方面优于现有方法。我们建立了一个可免费访问的在线网络服务器(http://lin-group.cn/server/iProEP/)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/c3a52f219613/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/95351520059d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/0dcc15a04b4d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/2e2b459f4e00/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/082d94999d8f/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/c3a52f219613/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/95351520059d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/0dcc15a04b4d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/2e2b459f4e00/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/082d94999d8f/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2a9/6616480/c3a52f219613/gr5.jpg

相似文献

1
iProEP: A Computational Predictor for Predicting Promoter.iProEP:一种用于预测启动子的计算预测工具。
Mol Ther Nucleic Acids. 2019 Sep 6;17:337-346. doi: 10.1016/j.omtn.2019.05.028. Epub 2019 Jun 13.
2
iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition.iPSW(2L)-PseKNC:一种双层预测器,通过伪 K- 元核苷酸组成的混合特征来识别启动子及其强度。
Genomics. 2019 Dec;111(6):1785-1793. doi: 10.1016/j.ygeno.2018.12.001. Epub 2018 Dec 5.
3
iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition.iNuc-PseKNC:一种基于序列的预测器,用于预测基因组中具有伪 k-元核苷酸组成的核小体定位。
Bioinformatics. 2014 Jun 1;30(11):1522-9. doi: 10.1093/bioinformatics/btu083. Epub 2014 Feb 6.
4
Computational identification of promoters in by using support vector machine.利用支持向量机对[具体对象]中的启动子进行计算识别。 (原文中“in by using”表述不完整,推测应该是“in [具体对象] by using” ,这里按照推测后的完整意思翻译)
Front Microbiol. 2023 May 5;14:1200678. doi: 10.3389/fmicb.2023.1200678. eCollection 2023.
5
iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.iPro54-PseKNC:一种基于序列的预测工具,用于通过伪k元核苷酸组成识别原核生物中的σ-54启动子。
Nucleic Acids Res. 2014 Dec 1;42(21):12961-72. doi: 10.1093/nar/gku1019. Epub 2014 Oct 31.
6
iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.iTerm-PseKNC:一种基于序列的细菌转录终止子预测工具。
Bioinformatics. 2019 May 1;35(9):1469-1477. doi: 10.1093/bioinformatics/bty827.
7
Eukaryotic and prokaryotic promoter prediction using hybrid approach.使用混合方法进行真核和原核启动子预测。
Theory Biosci. 2011 Jun;130(2):91-100. doi: 10.1007/s12064-010-0114-8. Epub 2010 Nov 3.
8
Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition.利用新型伪核苷酸组成识别 Sigma70 启动子。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1316-1321. doi: 10.1109/TCBB.2017.2666141. Epub 2017 Feb 8.
9
ncPro-ML: An integrated computational tool for identifying non-coding RNA promoters in multiple species.ncPro-ML:一种用于识别多种物种中非编码RNA启动子的综合计算工具。
Comput Struct Biotechnol J. 2020 Sep 10;18:2445-2452. doi: 10.1016/j.csbj.2020.09.001. eCollection 2020.
10
iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC.iPromoter-2L:一种双层预测器,通过基于多窗口的 PseKNC 来识别启动子及其类型。
Bioinformatics. 2018 Jan 1;34(1):33-40. doi: 10.1093/bioinformatics/btx579.

引用本文的文献

1
Bioinformatic exploration reveals features of tenpIN family of type III toxin-antitoxin systems in bacteria and viruses.生物信息学探索揭示了细菌和病毒中III型毒素-抗毒素系统tenpIN家族的特征。
Sci Rep. 2025 Jul 22;15(1):26624. doi: 10.1038/s41598-025-04853-0.
2
Harnessing promoter elements to enhance gene editing in plants: perspectives and advances.利用启动子元件增强植物基因编辑:观点与进展
Plant Biotechnol J. 2025 May;23(5):1375-1395. doi: 10.1111/pbi.14533. Epub 2025 Feb 27.
3
CryptKeeper: a negative design tool for reducing unintentional gene expression in bacteria.

本文引用的文献

1
Evaluation of different computational methods on 5-methylcytosine sites identification.不同计算方法在 5-甲基胞嘧啶位点识别中的评估。
Brief Bioinform. 2020 May 21;21(3):982-995. doi: 10.1093/bib/bbz048.
2
iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families.iNR-2L:一种基于序列的两级预测器,通过 Chou 的 5 步规则和广义 PseAAC 开发,用于识别核受体及其家族。
Genomics. 2020 Jan;112(1):276-285. doi: 10.1016/j.ygeno.2019.02.006. Epub 2019 Feb 16.
3
Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response.
CryptKeeper:一种用于减少细菌中意外基因表达的阴性设计工具。
Synth Biol (Oxf). 2024 Dec 2;9(1):ysae018. doi: 10.1093/synbio/ysae018. eCollection 2024.
4
Human essential gene identification based on feature fusion and feature screening.基于特征融合与特征筛选的人类必需基因识别
IET Syst Biol. 2024 Dec;18(6):227-237. doi: 10.1049/syb2.12105. Epub 2024 Nov 22.
5
Ethylene and epoxyethane metabolism in methanotrophic bacteria: comparative genomics and physiological studies using .甲烷营养型细菌中的乙烯和环氧乙烷代谢:比较基因组学和生理研究利用。
Microb Genom. 2024 Oct;10(10). doi: 10.1099/mgen.0.001306.
6
CryptKeeper: a negative design tool for reducing unintentional gene expression in bacteria.密码守护者:一种用于减少细菌中无意基因表达的阴性设计工具。
bioRxiv. 2024 Sep 5:2024.09.05.611466. doi: 10.1101/2024.09.05.611466.
7
Promoter Prediction in Agrobacterium tumefaciens Strain C58 by Using Artificial Intelligence Strategies.利用人工智能策略预测根癌农杆菌 C58 菌株的启动子。
Methods Mol Biol. 2024;2844:33-44. doi: 10.1007/978-1-0716-4063-0_2.
8
Making a Pathogen? Evaluating the Impact of Protist Predation on the Evolution of Virulence in Serratia marcescens.制造病原体?评估原生动物捕食对粘质沙雷氏菌毒力进化的影响。
Genome Biol Evol. 2024 Aug 5;16(8). doi: 10.1093/gbe/evae149.
9
msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.msBERT-Promoter:一种基于 BERT 预训练模型的多尺度集成预测器,用于 DNA 启动子及其强度的两阶段预测。
BMC Biol. 2024 May 30;22(1):126. doi: 10.1186/s12915-024-01923-z.
10
Computationally guided AAV engineering for enhanced gene delivery.基于计算的 AAV 工程改造以增强基因传递。
Trends Biochem Sci. 2024 May;49(5):457-469. doi: 10.1016/j.tibs.2024.03.002. Epub 2024 Mar 25.
深度呼吸森林:一种用于预测抗癌药物反应的深度森林模型。
Methods. 2019 Aug 15;166:91-102. doi: 10.1016/j.ymeth.2019.02.009. Epub 2019 Feb 14.
4
MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters.MULTiPly:一种用于发现通用和特定类型启动子的新型多层预测器。
Bioinformatics. 2019 Sep 1;35(17):2957-2965. doi: 10.1093/bioinformatics/btz016.
5
i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome.i6mA-Pred:鉴定水稻基因组中的 DNA N6-甲基腺嘌呤位点。
Bioinformatics. 2019 Aug 15;35(16):2796-2800. doi: 10.1093/bioinformatics/btz015.
6
Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique.使用两步特征选择技术鉴定酿酒酵母中的复制原点。
Bioinformatics. 2019 Jun 1;35(12):2075-2083. doi: 10.1093/bioinformatics/bty943.
7
Gene2vec: gene subsequence embedding for prediction of mammalian -methyladenosine sites from mRNA.Gene2vec:基于基因子序列的嵌体模型,用于从 mRNA 预测哺乳动物 m6A 修饰位点。
RNA. 2019 Feb;25(2):205-218. doi: 10.1261/rna.069112.118. Epub 2018 Nov 13.
8
Deep learning in omics: a survey and guideline.组学中的深度学习:综述与指南。
Brief Funct Genomics. 2019 Feb 14;18(1):41-57. doi: 10.1093/bfgp/ely030.
9
iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.iTerm-PseKNC:一种基于序列的细菌转录终止子预测工具。
Bioinformatics. 2019 May 1;35(9):1469-1477. doi: 10.1093/bioinformatics/bty827.
10
Sequence clustering in bioinformatics: an empirical study.生物信息学中的序列聚类:一项实证研究。
Brief Bioinform. 2020 Jan 17;21(1):1-10. doi: 10.1093/bib/bby090.