Aggarwal Manu, Periwal Vipul
Laboratory of Biological Modeling, NIDDK, National Institutes of Health, 31 Center Dr, Bethesda, 20892, MD, United States.
J Comput Sci. 2024 Jul;79. doi: 10.1016/j.jocs.2024.102290. Epub 2024 Apr 20.
Persistent homology (PH) is an approach to topological data analysis (TDA) that computes multi-scale topologically invariant properties of high-dimensional data that are robust to noise. While PH has revealed useful patterns across various applications, computational requirements have limited applications to small data sets of a few thousand points. We present Dory, an efficient and scalable algorithm that can compute the persistent homology of sparse Vietoris-Rips complexes on larger data sets, up to and including dimension two and over the field . As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set containing approximately three million points. Extant algorithms were unable to process it, whereas Dory processed it within five minutes, using less than five GB of memory. Results show that the topology of the human genome changes significantly upon treatment with auxin, a molecule that degrades cohesin, corroborating the hypothesis that cohesin plays a crucial role in loop formation in DNA.
持久同调(PH)是一种拓扑数据分析(TDA)方法,它可以计算高维数据的多尺度拓扑不变属性,这些属性对噪声具有鲁棒性。虽然PH在各种应用中都揭示了有用的模式,但计算需求限制了其应用于几千个点的小数据集。我们提出了Dory,这是一种高效且可扩展的算法,它可以在更大的数据集上计算稀疏Vietoris-Rips复形的持久同调,维度可达二维及包括二维,且在该域上。作为一个应用,我们以高分辨率计算了人类基因组的PH,这是由一个包含约三百万个点的全基因组Hi-C数据集揭示的。现有的算法无法处理它,而Dory在不到五分钟的时间内使用不到5GB的内存就处理了它。结果表明,用生长素(一种降解黏连蛋白的分子)处理后,人类基因组的拓扑结构发生了显著变化,这证实了黏连蛋白在DNA环形成中起关键作用的假设。