Joint Institute for Nuclear Research, Dubna, Russia.
Horia Hulubei, National Institute of Physics and Nuclear Engineering, Bucharest-Magurele, Romania.
Int J Mol Sci. 2020 Jun 30;21(13):4651. doi: 10.3390/ijms21134651.
The arrangement of A, C, G and T nucleotides in large DNA sequences of many prokaryotic and eukaryotic cells exhibit long-range correlations with fractal properties. Chaos game representation (CGR) of such DNA sequences, followed by a multifractal analysis, is a useful way to analyze the corresponding scaling properties. This approach provides a powerful visualization method to characterize their spatial inhomogeneity, and allows discrimination between mono- and multifractal distributions. However, in some cases, two different arbitrary point distributions, may generate indistinguishable multifractal spectra. By using a new model based on multiplicative deterministic cascades, here it is shown that small-angle scattering (SAS) formalism can be used to address such issue, and to extract additional structural information. It is shown that the box-counting dimension given by multifractal spectra can be recovered from the scattering exponent of SAS intensity in the fractal region. This approach is illustrated for point distributions of CGR data corresponding to , and DNA, and it is shown that for the latter two cases, SAS allows extraction of the fractal iteration number and the scaling factor corresponding to "ACGT" square, or to recover the number of bases. The results are compared with a model based on multiplicative deterministic cascades, and respectively with one which takes into account the existence of forbidden sequences in DNA. This allows a classification of the DNA sequences in terms of random and deterministic fractals structures emerging in CGR.
在许多原核细胞和真核细胞的大 DNA 序列中,A、C、G 和 T 核苷酸的排列表现出与分形特性的长程相关性。这种 DNA 序列的混沌游戏表示(CGR),再加上多重分形分析,是分析相应标度性质的一种有用方法。这种方法提供了一种强大的可视化方法来描述它们的空间非均匀性,并允许区分单分形和多分形分布。然而,在某些情况下,两个不同的任意点分布可能会产生无法区分的多分形谱。通过使用基于乘法确定性级联的新模型,本文表明可以使用小角度散射(SAS)形式主义来解决这个问题,并提取额外的结构信息。结果表明,多分形谱给出的盒子计数维数可以从分形区域中 SAS 强度的散射指数中恢复出来。该方法应用于对应于 、 和 DNA 的 CGR 数据的点分布,结果表明,对于后两种情况,SAS 允许提取分形迭代次数和“ACGT”平方的比例因子,或者恢复碱基的数量。结果与基于乘法确定性级联的模型进行了比较,并与考虑 DNA 中禁止序列存在的模型进行了比较。这允许根据在 CGR 中出现的随机和确定性分形结构对 DNA 序列进行分类。