Rosandić Marija, Paar Vladimir, Gluncić Matko, Basar Ivan, Pavin Nenad
Department of Medicine, Zagreb University Hospital Center, Croatia.
Croat Med J. 2003 Aug;44(4):386-406.
To use a novel computational approach, Key-string Algorithm (KSA), for the identification and analysis of arbitrarily large repetitive sequences and higher-order repeats (HORs) in noncoding DNA. This approach is based on the use of key string that plays a role of an arbitrarily constructed "computer enzyme".
A cluster of novel KSA-related methods was introduced and developed on the basis of a combination of computations on a very modest scale, by eye inspection and graphical display of results of analysis. Sequence analysis software was developed, containing seven programs for KSA-related analyses. This approach was demonstrated in the case study of alpha satellites and HORs in the human genetic sequence AC017075.8 (193277 bp) from the centromeric region of human chromosome 7. The KSA segmentation method was applied by using DCCGTTT, GTA, and TTTC key strings.
Fifty-five copies of 2734-bp 16mer HORs were identified and investigated, and a start-string TTTTTTAAAAA was identified. The HOR-matrix was constructed and employed for graphical display of mutations. KSA identification of HORs in AC017075.8 was compared with that of RepeatMasker and Tandem Repeat Finder, which identified alpha monomers in AC017075.8, but not the HORs. On the basis of KSA study, the centromere folding was described as an effect of HORs and super-HORs (3 x 2734 bp) in AC017075.8. The following novel computational KSA-based methods, easy-to-use and intended for computational "pedestrians", were demonstrated: color-HOR diagram, KSA-divergence method, 171-bp subsequence-convergence diagram, and total frequency distribution of the key-string subsequence lengths. The results were supplemented by Fast Fourier Transform, employing a novel mapping of symbolic genomic sequence into a numerical sequence.
The KSA approach offers a simple and robust framework for a wide range of investigations of large repetitive sequences and HORs, involving a very modest scope of computations that can be carried out by using a PC. As the KSA method is HOR-oriented, the identification of HORs is even easier than the identification of underlying alpha monomer itself. This approach provides an easy identification of point mutations, insertions, and deletions, with respect to consensus. This may be useful in a wide range of investigations and applied in forensic medicine, medical diagnosis of malignant diseases, biological evolution, and paleontology.
使用一种新颖的计算方法——密钥串算法(KSA),用于识别和分析非编码DNA中任意长度的重复序列和高阶重复序列(HORs)。该方法基于使用起任意构建的“计算机酶”作用的密钥串。
基于非常适度规模的计算、目视检查和分析结果的图形显示相结合,引入并开发了一组与KSA相关的新方法。开发了序列分析软件,包含七个用于KSA相关分析的程序。在对来自人类7号染色体着丝粒区域的人类遗传序列AC017075.8(193277 bp)中的α卫星和HORs的案例研究中展示了该方法。通过使用DCCGTTT、GTA和TTTC密钥串应用了KSA分割方法。
鉴定并研究了55个2734 bp的16聚体HORs拷贝,并鉴定出起始串TTTTTTAAAAA。构建了HOR矩阵并用于图形化显示突变。将AC017075.8中HORs的KSA鉴定结果与RepeatMasker和串联重复序列查找器的结果进行了比较,后者鉴定出了AC017075.8中的α单体,但未鉴定出HORs。基于KSA研究,着丝粒折叠被描述为AC017075.8中HORs和超级HORs(3×2734 bp)的作用结果。展示了以下基于KSA的新颖计算方法,这些方法易于使用且适用于计算“外行”:彩色HOR图、KSA差异方法、171 bp子序列收敛图以及密钥串子序列长度的总频率分布。结果通过快速傅里叶变换进行补充,采用了将符号基因组序列新颖映射为数字序列的方法。
KSA方法为广泛研究大型重复序列和HORs提供了一个简单而稳健的框架,涉及使用个人计算机即可进行的非常适度规模的计算。由于KSA方法以HOR为导向,HORs的鉴定甚至比鉴定潜在的α单体本身更容易。该方法提供了相对于共有序列的点突变、插入和缺失的简便鉴定。这在广泛的研究中可能有用,并可应用于法医学、恶性疾病的医学诊断、生物进化和古生物学。