Brodzik Andrzej K
The MITRE Corporation, Bedford, MA 01730, USA.
Bioinformatics. 2007 Mar 15;23(6):694-700. doi: 10.1093/bioinformatics/btl674. Epub 2007 Jan 19.
One of the main tasks of DNA sequence analysis is identification of repetitive patterns. DNA symbol repetitions play a key role in a number of applications, including prediction of gene and exon locations, identification of diseases, reconstruction of human evolutionary history and DNA forensics.
A new approach towards identification of tandem repeats in DNA sequences is proposed. The approach is a refinement of previously considered method, based on the complex periodicity transform. The refinement is obtained, among others, by mapping of DNA symbols to pure quaternions. This mapping results in an enhanced, symbol-balanced sensitivity of the transform to DNA patterns, and an unambiguous threshold selection criterion. Computational efficiency of the transform is further improved, and coupling of the computation with the period value is removed, thereby facilitating parallel implementation of the algorithm. Additionally, a post-processing stage is inserted into the algorithm, enabling unambiguous display of results in a convenient graphical format. Comparison of the quaternionic periodicity transform with two well-known pattern detection techniques shows that the new approach is competitive with these two techniques in detection of exact and approximate repeats.
DNA序列分析的主要任务之一是识别重复模式。DNA符号重复在许多应用中起着关键作用,包括基因和外显子位置的预测、疾病的识别、人类进化史的重建以及DNA法医鉴定。
提出了一种识别DNA序列中串联重复的新方法。该方法是对先前基于复周期变换所考虑方法的改进。这种改进尤其通过将DNA符号映射到纯四元数来实现。这种映射提高了变换对DNA模式的符号平衡敏感性,并产生了明确的阈值选择标准。变换的计算效率进一步提高,并且消除了计算与周期值的耦合,从而便于算法的并行实现。此外,在算法中插入了一个后处理阶段,能够以方便的图形格式明确显示结果。四元数周期变换与两种著名模式检测技术的比较表明,新方法在检测精确和近似重复方面与这两种技术具有竞争力。