Suppr超能文献

利用多任务矩阵分解研究三维基因组组织的动力学

Examining the dynamics of three-dimensional genome organization with multitask matrix factorization.

作者信息

Lee Da-Inn, Roy Sushmita

机构信息

Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA.

Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA;

出版信息

Genome Res. 2025 May 2;35(5):1179-1193. doi: 10.1101/gr.279930.124.

Abstract

Three-dimensional (3D) genome organization, which determines how the DNA is packaged inside the nucleus, has emerged as a key component of the gene regulation machinery. High-throughput chromosome conformation data sets, such as Hi-C, have become available across multiple conditions and time points, offering a unique opportunity to examine changes in 3D genome organization and link them to phenotypic changes in normal and disease processes. However, systematic detection of higher-order structural changes across multiple Hi-C data sets remains a major challenge. Existing computational methods either do not model higher-order structural units or cannot model dynamics across more than two conditions of interest. We address these limitations with tree-guided integrated factorization (TGIF), a generalizable multitask nonnegative matrix factorization (NMF) approach that can be applied to time series or hierarchically related biological conditions. TGIF can identify large-scale changes at the compartment or subcompartment levels, as well as local changes at boundaries of topologically associated domains (TADs). Based on benchmarking in simulated and real Hi-C data, TGIF boundaries are more accurate and reproducible across differential levels of noise and sources of technical artifacts, and are more enriched in CTCF. Application to three multisample mammalian data sets shows that TGIF can detect differential regions at compartment, subcompartment, and boundary levels that are associated with significant changes in regulatory signals and gene expression enriched in tissue-specific processes. Finally, we leverage TGIF boundaries to prioritize sequence variants for multiple phenotypes from the NHGRI GWAS catalog. Taken together, TGIF is a flexible tool to examine 3D genome organization dynamics across disease and developmental processes.

摘要

三维(3D)基因组组织决定了DNA在细胞核内的包装方式,已成为基因调控机制的关键组成部分。高通量染色体构象数据集,如Hi-C,已在多种条件和时间点下可用,为研究3D基因组组织的变化并将其与正常和疾病过程中的表型变化联系起来提供了独特的机会。然而,系统检测多个Hi-C数据集之间的高阶结构变化仍然是一项重大挑战。现有的计算方法要么无法对高阶结构单元进行建模,要么无法对超过两个感兴趣条件下的动态变化进行建模。我们使用树引导的集成因子分解(TGIF)来解决这些限制,这是一种可推广的多任务非负矩阵分解(NMF)方法,可应用于时间序列或层次相关的生物学条件。TGIF可以识别隔室或亚隔室水平的大规模变化,以及拓扑相关结构域(TAD)边界的局部变化。基于对模拟和真实Hi-C数据的基准测试,TGIF边界在不同水平的噪声和技术伪影来源中更准确、更可重复,并且在CTCF中更丰富。应用于三个多样本哺乳动物数据集表明,TGIF可以检测与组织特异性过程中富集的调控信号和基因表达的显著变化相关的隔室、亚隔室和边界水平的差异区域。最后,我们利用TGIF边界对来自NHGRI GWAS目录的多种表型的序列变异进行优先级排序。综上所述,TGIF是一种灵活的工具,可用于研究疾病和发育过程中的3D基因组组织动态。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b0/12047540/7af1b37194cd/1179f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验