Suppr超能文献

基于规则的人类和果蝇物种启动子预测知识获取方法。

Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species.

作者信息

Huang Wen-Lin, Tung Chun-Wei, Liaw Chyn, Huang Hui-Ling, Ho Shinn-Ying

机构信息

Department of Management Information System, Asia Pacific Institute of Creativity, Miaoli 351, Taiwan.

School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan.

出版信息

ScientificWorldJournal. 2014;2014:327306. doi: 10.1155/2014/327306. Epub 2014 Jan 29.

Abstract

The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

摘要

当需要测序的基因组数量正迅速增加时,快速且可靠地识别启动子区域非常重要。人们已经开发了各种方法,但很少有方法研究基于序列的特征在启动子预测中的有效性。本研究提出了一种基于if-then规则的知识获取方法(名为PromHD),用于人类和果蝇物种的启动子预测。PromHD利用一种有效的特征挖掘算法和一个由167个DNA序列描述符(DNASD)组成的参考特征集,其中包括三个物理化学性质描述符(最大吸收波长、分子量和摩尔吸收系数)、128个排名靠前的4聚体基序描述符以及36个全局序列描述符。PromHD识别出两个分别包含99个和74个DNASD的特征子集,在人类和果蝇物种中的测试准确率分别为96.4%和97.5%。基于99维和74维特征向量,PromHD通过使用决策树机制生成了几条用于启动子预测的if-then规则。排名靠前且具有高确定性等级的信息性规则表明,全局序列描述符、序列第一位核苷酸A的长度以及两个物理化学性质,即最大吸收波长和分子量,分别在区分人类和果蝇物种的启动子与非启动子方面有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bb/3927563/1e32d6433a4a/TSWJ2014-327306.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验