Thakur Vivek, Azad Rajeev K, Ramaswamy Ram
Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India.
Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Jan;75(1 Pt 1):011915. doi: 10.1103/PhysRevE.75.011915. Epub 2007 Jan 17.
We introduce Markov models for segmentation of symbolic sequences, extending a segmentation procedure based on the Jensen-Shannon divergence that has been introduced earlier. Higher-order Markov models are more sensitive to the details of local patterns and in application to genome analysis, this makes it possible to segment a sequence at positions that are biologically meaningful. We show the advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified.
我们引入用于符号序列分割的马尔可夫模型,扩展了此前基于 Jensen-Shannon 散度引入的分割程序。高阶马尔可夫模型对局部模式的细节更为敏感,在基因组分析中的应用使得在具有生物学意义的位置对序列进行分割成为可能。我们展示了基于高阶马尔可夫模型的分割程序在检测由不同物种基因组构建的嵌合 DNA 序列中的组成不均匀性方面的优势,并且在应用于大肠杆菌 K12 基因组时,能够准确识别基因组岛、隐匿性原噬菌体和水平获得区域的边界。