Suppr超能文献

关于转录因子结合位点检测中轮廓的作用

On the power of profiles for transcription factor binding site detection.

作者信息

Rahmann Sven, Müller Tobias, Vingron Martin

机构信息

Computational Molecular Biology, Max Planck Institute for Molecular Genetics, and Department of Mathematics and Computer Science, Freie Universität Berlin.

出版信息

Stat Appl Genet Mol Biol. 2003;2:Article7. doi: 10.2202/1544-6115.1032. Epub 2003 Nov 29.

Abstract

Transcription factor binding site (TFBS) detection plays an important role in computational biology, with applications in gene finding and gene regulation. The sites are often modeled by gapless profiles, also known as position-weight matrices. Past research has focused on the significance of profile scores (the ability to avoid false positives), but this alone is not enough: The profile must also possess the power to detect the true positive signals. Several completed genomes are now available, and the search for TFBSs is moving to a large scale; so discriminating signal from noise becomes even more challenging. Since TFBS profiles are usually estimated from only a few experimentally confirmed instances, careful regularization is an important issue. We present a novel method that is well suited for this situation. We further develop measures that help in judging profile quality, based on both sensitivity and selectivity of a profile. It is shown that these quality measures can be efficiently computed, and we propose statistically well-founded methods to choose score thresholds. Our findings are applied to the TRANSFAC database of transcription factor binding sites. The results are disturbing: If we insist on a significance level of 5% in sequences of length 500, only 19% of the profiles detect a true signal instance with 95% success probability under varying background sequence compositions.

摘要

转录因子结合位点(TFBS)检测在计算生物学中起着重要作用,在基因发现和基因调控方面有应用。这些位点通常由无间隙轮廓建模,也称为位置权重矩阵。过去的研究集中在轮廓分数的显著性(避免假阳性的能力)上,但仅此还不够:轮廓还必须具备检测真阳性信号的能力。现在有几个完整的基因组可用,对TFBS的搜索正朝着大规模发展;因此区分信号与噪声变得更具挑战性。由于TFBS轮廓通常仅从少数实验证实的实例中估计,仔细的正则化是一个重要问题。我们提出了一种非常适合这种情况的新方法。我们进一步开发了基于轮廓的敏感性和选择性来帮助判断轮廓质量的措施。结果表明,这些质量措施可以有效地计算,并且我们提出了具有统计学依据的方法来选择分数阈值。我们的研究结果应用于转录因子结合位点的TRANSFAC数据库。结果令人不安:如果我们在长度为500的序列中坚持5%的显著性水平,在不同的背景序列组成下,只有19%的轮廓以95%的成功概率检测到一个真信号实例。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验