Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
Laboratory of Virus Control, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan.
PLoS Comput Biol. 2020 Nov 30;16(11):e1008422. doi: 10.1371/journal.pcbi.1008422. eCollection 2020 Nov.
The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.
高通量测序获得的大量数据需要进行数据缩减,才能进行有效的分析。在这里,我们使用一种新的数据缩减方法,为全基因组开放染色质数据提供了一种聚类算法。该方法基于一组峰,将基因组视为 1 和 0 的字符串,并计算字符串之间的汉明距离。该算法与系统优化的峰集相结合,使我们能够定量评估造血细胞样本之间的差异,并对细胞类型进行分类,这可能有助于更好地了解白血病的发病机制。