• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种新型的基于人工智能的计算方法,利用张量分解在整合多个Hi-C数据集时识别常见的最优区间大小。

Novel AI-powered computational method using tensor decomposition for identification of common optimal bin sizes when integrating multiple Hi-C datasets.

作者信息

Taguchi Y-H, Turki Turki

机构信息

Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan.

Department of Computer Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia.

出版信息

Sci Rep. 2025 Mar 3;15(1):7459. doi: 10.1038/s41598-025-91355-8.

DOI:10.1038/s41598-025-91355-8
PMID:40033014
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11876364/
Abstract

Identifying the optimal bin sizes (or resolutions) for the integration of multiple Hi-C datasets is a challenge due to the fact that bin sizes must be common over multiple datasets. By contrast, the dependence of quality upon bin sizes can vary from dataset to dataset. Moreover, common structures should not be sought in bin sizes smaller than the optimal bin sizes, below which common structure cannot be the primary structure any more even after increasing the number of mapped short reads per bin. In this case, there are no common structures at finer resolutions, suggesting that individual Hi-C datasets may have to be analyzed separately in the bin sizes smaller than the optimal one. Thus, quality assessments of individual datasets have a limited ability to determine the best bin size for all datasets. In this study, we propose a novel application of tensor decomposition (TD) based unsupervised feature extraction (FE) to choose the optimal bin sizes for the integration of multiple Hi-C datasets. TD-based unsupervised FE exhibit phase transition-like phenomena through which the smallest possible bin size (or the highest resolution) can be automatically estimated empirically, without the need to manually set a threshold value for the integration of multiple Hi-C datasets, retrieved from GEO with GEO ID, GSE260760 and GSE255264. To our knowledge, ours is the first one that can optimize bin sizes over multiple Hi-C profiles without any tunable parameters.

摘要

由于多个Hi-C数据集整合时的bin大小必须一致,因此确定最优的bin大小(或分辨率)颇具挑战。相比之下,质量对bin大小的依赖在不同数据集之间可能有所不同。此外,在小于最优bin大小的情况下不应寻求共同结构,因为即便增加每个bin中映射短读段的数量,低于该大小的共同结构也不再是主要结构。在这种情况下,更精细分辨率下不存在共同结构,这表明对于小于最优大小的bin,可能需要分别分析各个Hi-C数据集。因此,单个数据集的质量评估在确定所有数据集的最佳bin大小方面能力有限。在本研究中,我们提出一种基于张量分解(TD)的无监督特征提取(FE)的新应用,用于为多个Hi-C数据集的整合选择最优的bin大小。基于TD的无监督FE呈现出类似相变的现象,通过该现象可以凭经验自动估计最小可能的bin大小(或最高分辨率),而无需手动设置用于整合从GEO检索到的多个Hi-C数据集(GEO ID为GSE260760和GSE255264)的阈值。据我们所知,我们是首个能够在无需任何可调参数的情况下针对多个Hi-C图谱优化bin大小的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/7b94a4dd81b6/41598_2025_91355_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/89d96cdc474b/41598_2025_91355_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/a755154ec07d/41598_2025_91355_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/8b9c8aaaa3ca/41598_2025_91355_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/56a1d9d08777/41598_2025_91355_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/39eda27e9e8f/41598_2025_91355_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/1b6285f20da4/41598_2025_91355_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/1f9bec11a62c/41598_2025_91355_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/7b94a4dd81b6/41598_2025_91355_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/89d96cdc474b/41598_2025_91355_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/a755154ec07d/41598_2025_91355_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/8b9c8aaaa3ca/41598_2025_91355_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/56a1d9d08777/41598_2025_91355_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/39eda27e9e8f/41598_2025_91355_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/1b6285f20da4/41598_2025_91355_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/1f9bec11a62c/41598_2025_91355_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6181/11876364/7b94a4dd81b6/41598_2025_91355_Fig8_HTML.jpg

相似文献

1
Novel AI-powered computational method using tensor decomposition for identification of common optimal bin sizes when integrating multiple Hi-C datasets.一种新型的基于人工智能的计算方法,利用张量分解在整合多个Hi-C数据集时识别常见的最优区间大小。
Sci Rep. 2025 Mar 3;15(1):7459. doi: 10.1038/s41598-025-91355-8.
2
scHiCEmbed: Bin-Specific Embeddings of Single-Cell Hi-C Data Using Graph Auto-Encoders.scHiCEmbed:基于图自动编码器的单细胞 Hi-C 数据的 Bin 特异性嵌入
Genes (Basel). 2022 Jun 11;13(6):1048. doi: 10.3390/genes13061048.
3
Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.基于张量分解的无监督特征提取在前列腺癌多组学数据中的应用。
Genes (Basel). 2020 Dec 11;11(12):1493. doi: 10.3390/genes11121493.
4
Tensor-Decomposition-Based Unsupervised Feature Extraction in Single-Cell Multiomics Data Analysis.基于张量分解的单细胞多组学数据分析中的无监督特征提取。
Genes (Basel). 2021 Sep 18;12(9):1442. doi: 10.3390/genes12091442.
5
A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods.深度学习的 Hi-C 分辨率提升方法的泛化能力综合评估。
Genes (Basel). 2023 Dec 29;15(1):54. doi: 10.3390/genes15010054.
6
Universal Nature of Drug Treatment Responses in Drug-Tissue-Wide Model-Animal Experiments Using Tensor Decomposition-Based Unsupervised Feature Extraction.基于张量分解的无监督特征提取在药物-组织-全模型动物实验中药物治疗反应的普遍性质
Front Genet. 2020 Aug 20;11:695. doi: 10.3389/fgene.2020.00695. eCollection 2020.
7
MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions.MrTADFinder:一种基于网络模块性的方法,用于在多个分辨率下识别拓扑关联结构域。
PLoS Comput Biol. 2017 Jul 24;13(7):e1005647. doi: 10.1371/journal.pcbi.1005647. eCollection 2017 Jul.
8
ParticleChromo3D: a Particle Swarm Optimization algorithm for chromosome 3D structure prediction from Hi-C data.ParticleChromo3D:一种用于从Hi-C数据预测染色体三维结构的粒子群优化算法。
BioData Min. 2022 Sep 21;15(1):19. doi: 10.1186/s13040-022-00305-x.
9
Novel feature selection method via kernel tensor decomposition for improved multi-omics data analysis.基于核张量分解的新型特征选择方法,用于改进多组学数据分析。
BMC Med Genomics. 2022 Feb 24;15(1):37. doi: 10.1186/s12920-022-01181-4.
10
scHiClassifier: a deep learning framework for cell type prediction by fusing multiple feature sets from single-cell Hi-C data.scHiClassifier:一种通过融合来自单细胞Hi-C数据的多个特征集进行细胞类型预测的深度学习框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf009.

本文引用的文献

1
HiCrayon reveals distinct layers of multi-state 3D chromatin organization.HiCrayon揭示了多状态三维染色质组织的不同层次。
NAR Genom Bioinform. 2024 Dec 18;6(4):lqae182. doi: 10.1093/nargab/lqae182. eCollection 2024 Dec.
2
Novel artificial intelligence-based identification of drug-gene-disease interaction using protein-protein interaction.基于蛋白质-蛋白质相互作用的新型人工智能药物-基因-疾病相互作用识别
BMC Bioinformatics. 2024 Dec 18;25(1):377. doi: 10.1186/s12859-024-06009-9.
3
Primary osteoarthritis chondrocyte map of chromatin conformation reveals novel candidate effector genes.
原发性骨关节炎软骨细胞染色质构象图谱揭示了新的候选效应基因。
Ann Rheum Dis. 2024 Jul 15;83(8):1048-1059. doi: 10.1136/ard-2023-224945.
4
Deep Multiview Module Adaption Transfer Network for Subject-Specific EEG Recognition.用于特定个体脑电图识别的深度多视图模块自适应迁移网络
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2917-2930. doi: 10.1109/TNNLS.2024.3350085. Epub 2025 Feb 6.
5
Genome-Wide Characterization of TAZ Binding Sites in Mammary Epithelial Cells.乳腺上皮细胞中TAZ结合位点的全基因组特征分析
Cancers (Basel). 2023 Sep 25;15(19):4713. doi: 10.3390/cancers15194713.
6
Chromatin alternates between A and B compartments at kilobase scale for subgenic organization.染色质在千碱基尺度上在 A 和 B 隔室之间交替,以实现亚基因组织。
Nat Commun. 2023 Jun 6;14(1):3303. doi: 10.1038/s41467-023-38429-1.
7
DeDoc2 Identifies and Characterizes the Hierarchy and Dynamics of Chromatin TAD-Like Domains in the Single Cells.DeDoc2 鉴定并描述了单细胞染色质 TAD 样结构域的层次结构和动力学特征。
Adv Sci (Weinh). 2023 Jul;10(20):e2300366. doi: 10.1002/advs.202300366. Epub 2023 May 10.
8
Prediction of recurrent spontaneous abortion using evolutionary machine learning with joint self-adaptive sime mould algorithm.基于联合自适应 Sime 模具算法的进化机器学习预测复发性自然流产。
Comput Biol Med. 2022 Sep;148:105885. doi: 10.1016/j.compbiomed.2022.105885. Epub 2022 Jul 26.
9
Spatio-Temporal-Spectral Hierarchical Graph Convolutional Network With Semisupervised Active Learning for Patient-Specific Seizure Prediction.基于半监督主动学习的时空谱分层图卷积网络用于个体化癫痫发作预测。
IEEE Trans Cybern. 2022 Nov;52(11):12189-12204. doi: 10.1109/TCYB.2021.3071860. Epub 2022 Oct 17.
10
Expanded encyclopaedias of DNA elements in the human and mouse genomes.人类和小鼠基因组中 DNA 元件的扩展百科全书。
Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29.