• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于规则的人类和果蝇物种启动子预测知识获取方法。

Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species.

作者信息

Huang Wen-Lin, Tung Chun-Wei, Liaw Chyn, Huang Hui-Ling, Ho Shinn-Ying

机构信息

Department of Management Information System, Asia Pacific Institute of Creativity, Miaoli 351, Taiwan.

School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan.

出版信息

ScientificWorldJournal. 2014;2014:327306. doi: 10.1155/2014/327306. Epub 2014 Jan 29.

DOI:10.1155/2014/327306
PMID:24955394
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3927563/
Abstract

The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

摘要

当需要测序的基因组数量正迅速增加时,快速且可靠地识别启动子区域非常重要。人们已经开发了各种方法,但很少有方法研究基于序列的特征在启动子预测中的有效性。本研究提出了一种基于if-then规则的知识获取方法(名为PromHD),用于人类和果蝇物种的启动子预测。PromHD利用一种有效的特征挖掘算法和一个由167个DNA序列描述符(DNASD)组成的参考特征集,其中包括三个物理化学性质描述符(最大吸收波长、分子量和摩尔吸收系数)、128个排名靠前的4聚体基序描述符以及36个全局序列描述符。PromHD识别出两个分别包含99个和74个DNASD的特征子集,在人类和果蝇物种中的测试准确率分别为96.4%和97.5%。基于99维和74维特征向量,PromHD通过使用决策树机制生成了几条用于启动子预测的if-then规则。排名靠前且具有高确定性等级的信息性规则表明,全局序列描述符、序列第一位核苷酸A的长度以及两个物理化学性质,即最大吸收波长和分子量,分别在区分人类和果蝇物种的启动子与非启动子方面有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/950e5e177882/TSWJ2014-327306.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/1e32d6433a4a/TSWJ2014-327306.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/6ddb89074820/TSWJ2014-327306.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/d115f017b7ad/TSWJ2014-327306.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/4754f4d26e72/TSWJ2014-327306.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/950e5e177882/TSWJ2014-327306.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/1e32d6433a4a/TSWJ2014-327306.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/6ddb89074820/TSWJ2014-327306.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/d115f017b7ad/TSWJ2014-327306.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/4754f4d26e72/TSWJ2014-327306.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/950e5e177882/TSWJ2014-327306.005.jpg

相似文献

1
Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species.基于规则的人类和果蝇物种启动子预测知识获取方法。
ScientificWorldJournal. 2014;2014:327306. doi: 10.1155/2014/327306. Epub 2014 Jan 29.
2
Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach.利用特征四聚体基序进行RNA聚合酶II启动子预测:一种机器学习方法
BMC Bioinformatics. 2008 Oct 4;9:414. doi: 10.1186/1471-2105-9-414.
3
Human pol II promoter prediction: time series descriptors and machine learning.人类RNA聚合酶II启动子预测:时间序列描述符与机器学习
Nucleic Acids Res. 2005 Mar 1;33(4):1332-6. doi: 10.1093/nar/gki271. Print 2005.
4
In silico prediction of major drug clearance pathways by support vector machines with feature-selected descriptors.利用具有特征选择描述符的支持向量机对主要药物清除途径进行计算机模拟预测。
Drug Metab Dispos. 2014 Nov;42(11):1811-9. doi: 10.1124/dmd.114.057893. Epub 2014 Aug 14.
5
G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs.G4PromFinder:一种基于富含 AT 的元件和 G-四链体基序预测 GC 丰富型细菌基因组转录启动子的算法。
BMC Bioinformatics. 2018 Feb 6;19(1):36. doi: 10.1186/s12859-018-2049-x.
6
The features of Drosophila core promoters revealed by statistical analysis.通过统计分析揭示的果蝇核心启动子特征。
BMC Genomics. 2006 Jun 21;7:161. doi: 10.1186/1471-2164-7-161.
7
Analysis of n-gram based promoter recognition methods and application to whole genome promoter prediction.基于n元语法的启动子识别方法分析及其在全基因组启动子预测中的应用。
In Silico Biol. 2009;9(1-2):S1-16.
8
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.稀有k-聚体DNA:序列基序的鉴定及CpG岛和启动子的预测
J Theor Biol. 2015 Dec 21;387:88-100. doi: 10.1016/j.jtbi.2015.09.014. Epub 2015 Sep 30.
9
Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns.基于图像的启动子预测:一种基于进化生成模式的启动子预测方法。
Sci Rep. 2018 Dec 6;8(1):17695. doi: 10.1038/s41598-018-36308-0.
10
Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.使用集成分类与关联规则挖掘(CBA)算法进行大流行性流感的知识发现和基于序列的预测。
J Biomed Inform. 2015 Oct;57:181-8. doi: 10.1016/j.jbi.2015.07.018. Epub 2015 Jul 30.

引用本文的文献

1
Identification of biomarkers for esophageal squamous cell carcinoma using feature selection and decision tree methods.使用特征选择和决策树方法鉴定食管鳞状细胞癌的生物标志物。
ScientificWorldJournal. 2013 Dec 12;2013:782031. doi: 10.1155/2013/782031. eCollection 2013.

本文引用的文献

1
EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era.EPD 和 EPDnew,新一代测序时代的高质量启动子资源。
Nucleic Acids Res. 2013 Jan;41(Database issue):D157-64. doi: 10.1093/nar/gks1233. Epub 2012 Nov 27.
2
Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes.对真核生物和原核生物中非经典分泌蛋白进行预测的基因本体论术语排序。
J Theor Biol. 2012 Nov 7;312:105-13. doi: 10.1016/j.jtbi.2012.07.027. Epub 2012 Aug 8.
3
Identification of amino acid propensities that are strong determinants of linear B-cell epitope using neural networks.
利用神经网络鉴定强线性 B 细胞表位决定簇的氨基酸倾向性。
PLoS One. 2012;7(2):e30617. doi: 10.1371/journal.pone.0030617. Epub 2012 Feb 8.
4
A comparison study on feature selection of DNA structural properties for promoter prediction.基于 DNA 结构特征的启动子预测特征选择的对比研究。
BMC Bioinformatics. 2012 Jan 7;13:4. doi: 10.1186/1471-2105-13-4.
5
Recognition of prokaryotic promoters based on a novel variable-window Z-curve method.基于新型可变窗口 Z 曲线方法的原核启动子识别。
Nucleic Acids Res. 2012 Feb;40(3):963-71. doi: 10.1093/nar/gkr795. Epub 2011 Sep 27.
6
Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties.使用系统方法预测和分析 DNA 结合域,以确定一组有意义的物理化学和生化特性。
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-12-S1-S47.
7
NClassG+: A classifier for non-classically secreted Gram-positive bacterial proteins.NClassG+:一种用于非经典分泌革兰氏阳性细菌蛋白的分类器。
BMC Bioinformatics. 2011 Jan 14;12:21. doi: 10.1186/1471-2105-12-21.
8
Some remarks on protein attribute prediction and pseudo amino acid composition.关于蛋白质属性预测和伪氨基酸组成的一些说明。
J Theor Biol. 2011 Mar 21;273(1):236-47. doi: 10.1016/j.jtbi.2010.12.024. Epub 2010 Dec 17.
9
High-quality annotation of promoter regions for 913 bacterial genomes.对 913 个细菌基因组的启动子区域进行高质量注释。
Bioinformatics. 2010 Dec 15;26(24):3043-50. doi: 10.1093/bioinformatics/btq577. Epub 2010 Oct 17.
10
Towards accurate human promoter recognition: a review of currently used sequence features and classification methods.迈向准确的人类启动子识别:当前使用的序列特征和分类方法综述
Brief Bioinform. 2009 Sep;10(5):498-508. doi: 10.1093/bib/bbp027. Epub 2009 Jun 16.