Bose Ranjan, Thiel Gerhard, Hamacher Kay
Department of Electrical Engineering, IIT Delhi, Hauz Khas, New Delhi 110016, India.
Department of Biology, Technische Universität Darmstadt, 64287 Darmstadt, Germany.
Viruses. 2014 May 30;6(6):2259-67. doi: 10.3390/v6062259.
We present a method for clustering genomic sequences based on variations in local entropy. We have analyzed the distributions of the block entropies of viruses and plant genomes. A distinct pattern for viruses and plant genomes is observed. These distributions, which describe the local entropic variability of the genomes, are used for clustering the genomes based on the Jensen-Shannon (JS) distances. The analysis of the JS distances between all genomes that infect the chlorella algae shows the host specificity of the viruses. We illustrate the efficacy of this entropy-based clustering technique by the segregation of plant and virus genomes into separate bins.
我们提出了一种基于局部熵变化对基因组序列进行聚类的方法。我们分析了病毒和植物基因组的分块熵分布。观察到病毒和植物基因组有明显的模式。这些描述基因组局部熵变异性的分布,被用于基于詹森-香农(JS)距离对基因组进行聚类。对所有感染小球藻的基因组之间的JS距离分析显示了病毒的宿主特异性。我们通过将植物和病毒基因组分离到不同类别中,说明了这种基于熵的聚类技术的有效性。