Liang Qiushi, Zhao Shengjie, Chen Lingxi, Li Shuai Cheng
School of Computer Science and Technology, Tongji University, Shanghai, 201804, China.
Department of Computer Science, City University of Hong Kong, 999077, Hong Kong, China.
Comput Struct Biotechnol J. 2025 Apr 30;27:1864-1886. doi: 10.1016/j.csbj.2025.04.037. eCollection 2025.
Entropy quantifies the limits of information compression and provides a theoretical foundation for exploring complex structures in large-scale graphs. However, effective metrics are needed to capture the intricate structural details in biological graphs. In this paper, we introduce the to quantify the complexity of biological graphs and show that minimizing the associated entropy is equivalent to optimal graph partitioning. We develop two methods, TEC-O and TEC-U, for partitioning ordered and unordered biological graphs. TEC-O is applied to identify Topologically Associated Domains (TADs) in Hi-C contact maps, while TEC-U is used for cell clustering in single-cell sequencing data. Results from simulated datasets demonstrate that topology entropy is robust to noise and effectively captures structural information, outperforming existing methods. Experiments on Hi-C data from five cell lines and ten single-cell sequencing datasets show that TEC-O and TEC-U achieve the highest accuracy in TAD detection and cell clustering, respectively, providing biologically meaningful insights.
熵量化了信息压缩的极限,并为探索大规模图中的复杂结构提供了理论基础。然而,需要有效的度量来捕捉生物图中复杂的结构细节。在本文中,我们引入了[具体内容未给出]来量化生物图的复杂性,并表明最小化相关熵等同于最优图划分。我们开发了两种方法,TEC - O和TEC - U,用于划分有序和无序生物图。TEC - O应用于在Hi - C接触图中识别拓扑相关结构域(TADs),而TEC - U用于单细胞测序数据中的细胞聚类。模拟数据集的结果表明,拓扑熵对噪声具有鲁棒性,并能有效捕捉结构信息,优于现有方法。对来自五个细胞系的Hi - C数据和十个单细胞测序数据集的实验表明,TEC - O和TEC - U分别在TAD检测和细胞聚类中达到了最高准确率,提供了具有生物学意义的见解。