Shen Wei, Zhang Ping, Jiang Yiwei, Tao Hailin, Zi Zhike, Li Li
College of Informatics, Huazhong Agricultural University, Wuhan, China.
Hubei Hongshan Laboratory, Hubei Key Laboratory of Agricultural Bioinformatics, Wuhan, China.
Genome Biol. 2024 Dec 2;25(1):302. doi: 10.1186/s13059-024-03445-x.
Topologically associating domains (TADs) are essential units of genome architecture, influencing transcriptional regulation and diseases. Despite numerous methods proposed for TAD identification, it remains challenging due to complex background and nested TAD structures. We introduce HTAD, a human-in-the-loop TAD caller that combines machine learning with human supervision to achieve high accuracy. HTAD begins with feature extraction for potential TAD border pairs, followed by an interactive labeling process through active learning. Performance assessments using public curation and synthetic datasets demonstrate HTAD's superiority over other state-of-the-art methods and reveal highly hierarchical TAD structures, offering a human-in-the-loop solution for detecting complex genomic patterns.
拓扑相关结构域(TADs)是基因组架构的基本单位,影响转录调控和疾病。尽管提出了许多用于TAD识别的方法,但由于复杂的背景和嵌套的TAD结构,其识别仍然具有挑战性。我们引入了HTAD,这是一种人在回路中的TAD调用工具,它将机器学习与人工监督相结合以实现高精度。HTAD首先对潜在的TAD边界对进行特征提取,然后通过主动学习进行交互式标记过程。使用公共策展和合成数据集进行的性能评估证明了HTAD优于其他现有方法,并揭示了高度分层的TAD结构,为检测复杂的基因组模式提供了一种人在回路中的解决方案。