Department of Computer Engineering, University of Zanjan, Zanjan, Iran.
Department of Biology, Faculty of Sciences, University of Zanjan, Zanjan, Iran.
Sci Rep. 2019 Jul 17;9(1):10361. doi: 10.1038/s41598-019-46844-y.
Sequence data are deposited in the form of unphased genotypes and it is not possible to directly identify the location of a particular allele on a specific parental chromosome or haplotype. This study employed nonlinear time series modeling approaches to analyze the haplotype sequences obtained from the NGS sequencing method. To evaluate the chaotic behavior of haplotypes, we analyzed their whole sequences, as well as several subsequences from distinct haplotypes, in terms of the SNP distribution on their chromosomes. This analysis utilized chaos game representation (CGR) followed by the application of two different scaling methods. It was found that chaotic behavior clearly exists in most haplotype subsequences. For testing the applicability of the proposed model, the present research determined the alleles in gap positions and positions with low coverage by using chromosome subsequences in which 10% of each subsequence's alleles are replaced by gaps. After conversion of the subsequences' CGR into the coordinate series, a Local Projection (LP) method predicted the measure of ambiguous positions in the coordinate series. It was discovered that the average reconstruction rate for all input data is more than 97%, demonstrating that applying this knowledge can effectively improve the reconstruction rate of given haplotypes.
序列数据以未相位基因型的形式存储,因此不可能直接确定特定等位基因在特定亲本染色体或单倍型上的位置。本研究采用非线性时间序列建模方法来分析从 NGS 测序方法获得的单倍型序列。为了评估单倍型的混沌行为,我们根据其染色体上 SNP 的分布,分析了它们的整个序列以及来自不同单倍型的几个子序列。该分析利用混沌游戏表示(CGR),然后应用两种不同的缩放方法。结果表明,大多数单倍型子序列中存在明显的混沌行为。为了测试所提出模型的适用性,本研究通过用空位替换每个子序列的 10%等位基因,来确定缺口位置和覆盖度低的位置的等位基因。在将子序列的 CGR 转换为坐标序列后,局部投影(LP)方法预测了坐标序列中模糊位置的度量。结果发现,对于所有输入数据的平均重建率都超过 97%,这表明应用该知识可以有效地提高给定单倍型的重建率。