根据多条序列比对结果计算信息含量的P值。

Computing the P-value of the information content from an alignment of multiple sequences.

作者信息

Nagarajan Niranjan, Jones Neil, Keich Uri

机构信息

Computer Science Department 4130 Upson Hall Cornell University Ithaca, NY 14853, USA.

出版信息

Bioinformatics. 2005 Jun;21 Suppl 1:i311-8. doi: 10.1093/bioinformatics/bti1044.

DOI:10.1093/bioinformatics/bti1044

PMID:15961473

Abstract

MOTIVATION

The efficient and accurate computation of P-values is an essential requirement for motif-finding and alignment tools. We show that the approximation algorithms used in two popular motif-finding programs, MEME and Consensus, can fail to accurately compute the P-value.

RESULTS

We present two new algorithms: one for the evaluation of the P-values of a range of motif scores, and a faster one for the evaluation of the P-value of a single motif score. Both exhibit more reliability than existing algorithms, and the latter algorithm is comparable in speed to the fastest existing method.

AVAILABILITY

The algorithms described in this paper are available from http://www.cs.cornell.edu/~keich

摘要

动机

P值的高效准确计算是基序查找和比对工具的一项基本要求。我们表明，两个流行的基序查找程序MEME和Consensus中使用的近似算法可能无法准确计算P值。

结果

我们提出了两种新算法：一种用于评估一系列基序得分的P值，另一种用于评估单个基序得分的P值的更快算法。两者都比现有算法表现出更高的可靠性，并且后一种算法在速度上与现有的最快方法相当。

可用性

本文所述算法可从http://www.cs.cornell.edu/~keich获取。

相似文献

Computing the P-value of the information content from an alignment of multiple sequences.

Bioinformatics. 2005 Jun;21 Suppl 1:i311-8. doi: 10.1093/bioinformatics/bti1044.

Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone.

Bioinformatics. 2006 Jul 15;22(14):e393-401. doi: 10.1093/bioinformatics/btl245.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Model-based prediction of sequence alignment quality.

Bioinformatics. 2008 Oct 1;24(19):2165-71. doi: 10.1093/bioinformatics/btn414. Epub 2008 Aug 4.

Finding motifs from all sequences with and without binding sites.

Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26.

SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures.

Bioinformatics. 2005 Sep 15;21(18):3615-21. doi: 10.1093/bioinformatics/bti582. Epub 2005 Jul 14.

MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.

Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.

Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution.

Bioinformatics. 2005 Jun;21 Suppl 1:i413-22. doi: 10.1093/bioinformatics/bti1033.

ARCS: an aggregated related column scoring scheme for aligned sequences.

Bioinformatics. 2006 Oct 1;22(19):2326-32. doi: 10.1093/bioinformatics/btl398. Epub 2006 Jul 26.

T-Coffee: A novel method for fast and accurate multiple sequence alignment.

J Mol Biol. 2000 Sep 8;302(1):205-17. doi: 10.1006/jmbi.2000.4042.

引用本文的文献

Markov chains improve the significance computation of overlapping genome annotations.

Bioinformatics. 2022 Jun 24;38(Suppl 1):i203-i211. doi: 10.1093/bioinformatics/btac255.

STREME: accurate and versatile sequence motif discovery.

Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.

Parametric bootstrapping for biological sequence motifs.

BMC Bioinformatics. 2016 Oct 6;17(1):406. doi: 10.1186/s12859-016-1246-8.

Accurate computation of survival statistics in genome-wide studies.

PLoS Comput Biol. 2015 May 7;11(5):e1004071. doi: 10.1371/journal.pcbi.1004071. eCollection 2015 May.

Improving MEME via a two-tiered significance analysis.

Bioinformatics. 2014 Jul 15;30(14):1965-73. doi: 10.1093/bioinformatics/btu163. Epub 2014 Mar 24.

STEME: a robust, accurate motif finder for large data sets.

PLoS One. 2014 Mar 13;9(3):e90735. doi: 10.1371/journal.pone.0090735. eCollection 2014.

Towards biological characters of interactions between transcription factors and their DNA targets in mammals.

BMC Genomics. 2012 Aug 13;13:388. doi: 10.1186/1471-2164-13-388.

Towards a theoretical understanding of false positives in DNA motif finding.

BMC Bioinformatics. 2012 Jun 27;13:151. doi: 10.1186/1471-2105-13-151.

Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers.

Bioinformatics. 2012 Jun 15;28(12):i215-23. doi: 10.1093/bioinformatics/bts210.

Assessing the effects of symmetry on motif discovery and modeling.

PLoS One. 2011;6(9):e24908. doi: 10.1371/journal.pone.0024908. Epub 2011 Sep 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

根据多条序列比对结果计算信息含量的P值。

Computing the P-value of the information content from an alignment of multiple sequences.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献