Gong Haiyan, Zhang Dawei, Zhang Xiaotong
Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China.
School of Computer and Communication Engineering, Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing, 100083, China.
Comput Struct Biotechnol J. 2023 Sep 27;21:4759-4768. doi: 10.1016/j.csbj.2023.09.019. eCollection 2023.
Topologically associated domains (TADs) play a pivotal role in disease detection. This study introduces a novel TADs recognition approach named TOAST, leveraging graph auto-encoders and clustering techniques. TOAST conceptualizes each genomic bin as a node of a graph and employs the Hi-C contact matrix as the graph's adjacency matrix. By employing graph auto-encoders, TOAST generates informative embeddings as features. Subsequently, the unsupervised clustering algorithm HDBSCAN is utilized to assign labels to each genomic bin, facilitating the identification of contiguous regions with the same label as TADs. Our experimental analysis of several simulated Hi-C data sets shows that TOAST can quickly and accurately identify TADs from different types of simulated Hi-C contact matrices, outperforming existing algorithms. We also determined the anchoring ratio of TAD boundaries by analyzing different TAD recognition algorithms, and obtained an average ratio of anchoring CTCF, SMC3, RAD21, POLR2A, H3K36me3, H3K9me3, H3K4me3, H3K4me1, Enhancer, and Promoters of 0.66, 0.47, 0.54, 0.27, 0.24, 0.12, 0.32, 0.41, 0.26, and 0.13, respectively. In conclusion, TOAST is a method that can quickly identify TAD boundary parameters that are easy to understand and have important biological significance. The TOAST web server can be accessed via http://223.223.185.189:4005/. The code of TOAST is available online at https://github.com/ghaiyan/TOAST.
拓扑相关结构域(TADs)在疾病检测中起着关键作用。本研究引入了一种名为TOAST的新型TADs识别方法,该方法利用图自动编码器和聚类技术。TOAST将每个基因组区间概念化为图的一个节点,并使用Hi-C接触矩阵作为图的邻接矩阵。通过使用图自动编码器,TOAST生成信息丰富的嵌入作为特征。随后,利用无监督聚类算法HDBSCAN为每个基因组区间分配标签,便于识别与TADs具有相同标签的连续区域。我们对几个模拟的Hi-C数据集进行的实验分析表明,TOAST能够从不同类型的模拟Hi-C接触矩阵中快速准确地识别TADs,性能优于现有算法。我们还通过分析不同的TAD识别算法确定了TAD边界的锚定率,得到CTCF、SMC3、RAD21、POLR2A、H3K36me3、H3K9me3、H3K4me3、H3K4me1、增强子和启动子的平均锚定率分别为0.66、0.47、0.54、0.27、0.24、0.12、0.32、0.41、0.26和0.13。总之,TOAST是一种能够快速识别易于理解且具有重要生物学意义的TAD边界参数的方法。可通过http://223.223.185.189:4005/访问TOAST网络服务器。TOAST的代码可在https://github.com/ghaiyan/TOAST上在线获取。