Suppr超能文献

利用表达基因标记和基因建模在人类基因组中鉴定一致性启动子。

Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling.

作者信息

Liu Rongxiang, States David J

机构信息

Bioinformatics Program and the Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Genome Res. 2002 Mar;12(3):462-9. doi: 10.1101/gr.198002.

Abstract

Deciphering the human genome includes locating the promoters that initiate transcription and identifying the exons of genes. Many promoter prediction programs have been proposed, but when they are applied to extended regions of the genome, most of their predictions are false-positives. The extensive collection of gene transcript sequences is an important new source of information, which has not been used previously in promoter predictions. Our approach is to enhance the specificity of predictions by restricting the genomic regions that are searched using gene transcript alignments as anchors in the genome for gene modeling. We developed a consensus promoter prediction method combining previously developed algorithms with the GENSCAN gene modeling program. Our method, CONPRO (CONsensus PROmoter), identifies promoters with very high confidence, and the predicted promoters are guaranteed to be associated with genes. On our test data set, the method correctly detects promoters for approximately half of all human genes (37%-71%), and most predictions are true promoters (85%-90%). Applying our method to the human genome and human genes from the Unigene data set, we find the promoters for 13,744 genes. Of these, 6440 are genes with a functionally cloned mRNA, and 7304 are novel genes for which only expressed sequence tags (ESTs) are available. Candidate promoters for many novel genes will be a useful resource in elucidating complex biological response mechanisms.

摘要

解读人类基因组包括定位启动转录的启动子以及识别基因的外显子。已经提出了许多启动子预测程序,但当将它们应用于基因组的扩展区域时,其大多数预测都是假阳性。基因转录本序列的广泛收集是一个重要的新信息来源,以前在启动子预测中尚未使用过。我们的方法是通过使用基因转录本比对作为基因组中基因建模的锚点来限制搜索的基因组区域,从而提高预测的特异性。我们开发了一种将先前开发的算法与GENSCAN基因建模程序相结合的一致性启动子预测方法。我们的方法CONPRO(一致性启动子)能够以非常高的置信度识别启动子,并且预测的启动子保证与基因相关。在我们的测试数据集上,该方法能正确检测出约一半人类基因(37%-71%)的启动子,并且大多数预测都是真正的启动子(85%-90%)。将我们的方法应用于人类基因组和来自Unigene数据集的人类基因,我们找到了13744个基因的启动子。其中,6440个是具有功能克隆mRNA的基因,7304个是仅具有表达序列标签(EST)的新基因。许多新基因的候选启动子将成为阐明复杂生物反应机制的有用资源。

相似文献

4
Sequence patterns defining the 5' boundary of human genes.定义人类基因5'边界的序列模式。
Biopolymers. 2001 Oct 15;59(5):347-55. doi: 10.1002/1097-0282(20011015)59:5<347::AID-BIP1032>3.0.CO;2-6.
8
Retroviral promoters in the human genome.人类基因组中的逆转录病毒启动子。
Bioinformatics. 2008 Jul 15;24(14):1563-7. doi: 10.1093/bioinformatics/btn243. Epub 2008 Jun 5.

引用本文的文献

7
Fine tuning the transcription of ldhA for D-lactate production.优化 ldhA 的转录以生产 D-乳酸。
J Ind Microbiol Biotechnol. 2012 Aug;39(8):1209-17. doi: 10.1007/s10295-012-1116-y. Epub 2012 Mar 20.

本文引用的文献

5
The sequence of the human genome.人类基因组序列。
Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验