ConFind：一种用于保守序列识别的强大工具。

ConFind: a robust tool for conserved sequence identification.

作者信息

Smagala James A, Dawson Erica D, Mehlmann Martin, Townsend Michael B, Kuchta Robert D, Rowlen Kathy L

机构信息

Department of Chemistry and Biochemistry, The University of Colorado at Boulder, UCB #215, Boulder, CO 80309, USA.

出版信息

Bioinformatics. 2005 Dec 15;21(24):4420-2. doi: 10.1093/bioinformatics/bti719. Epub 2005 Oct 20.

DOI:10.1093/bioinformatics/bti719

PMID:16239306

Abstract

SUMMARY

ConFind (conserved region finder) identifies regions of conservation in multiple sequence alignments that can serve as diagnostic targets. Designed to work with a large number of closely related, highly variable sequences, ConFind provides robust handling of alignments containing partial sequences and ambiguous characters. Conserved regions are defined in terms of minimum region length, maximum informational entropy (variability) per position, number of exceptions allowed to the maximum entropy criterion and the minimum number of sequences that must contain a non-ambiguous character at a position to be considered for inclusion in a conserved region. Comparison of the calculated entropy for an alignment of 95 influenza A hemagglutinin sequences with random deletions results in a 98% reduction in the average error in ConFind relative to the 'Find Conserved Regions' option in BioEdit.

REQUIREMENTS

ConFind requires Python 2.3, but Python 2.4 or an upgrade of the optparse module to Optik 1.5 is suggested. The program is known to run under Linux and DOS.

摘要

ConFind（保守区域查找器）可识别多序列比对中的保守区域，这些区域可作为诊断靶点。ConFind旨在处理大量密切相关的高度可变序列，能对包含部分序列和模糊字符的比对进行稳健处理。保守区域是根据最小区域长度、每个位置的最大信息熵（变异性）、允许偏离最大熵标准的例外数量以及在一个位置必须包含非模糊字符才能被考虑纳入保守区域的最小序列数来定义的。将95个甲型流感血凝素序列比对的计算熵与随机删除结果进行比较，结果表明ConFind相对于BioEdit中的“查找保守区域”选项，平均误差降低了98%。