Suppr超能文献

使用GIBOOST增强数据解释,以提升复杂高维数据的可视化效果。

Boosting data interpretation with GIBOOST to enhance visualization of complex high-dimensional data.

作者信息

Atitey Komlan, Li Jiaqi, Papas Brian, Egbon Osafu A, Li Jian-Liang, Kana Musa, Aimola Idowu, Anchang Benedict

机构信息

Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T W Alexander Dr, Research Triangle Park, Durham, NC 27709, United States.

Department of Community Medicine, Faculty of Clinical Sciences, College of Medicine, Kaduna State University, Tafawa Balewa Way, Kaduna, Kaduna State, 800241, Nigeria.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf415.

Abstract

High-dimensional single-cell data analysis is crucial for understanding complex biological interactions, yet conventional dimensionality reduction methods (DRMs) often fail to preserve both global and local structures. Existing DRMs, such as t-distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), Principal Component Analysis (PCA), and Potential of Heat-diffusion for Affinity-based Transition Embedding (PHATE), optimize different visualization objectives, resulting in trade-offs between cluster separability, spatial organization, and temporal coherence. To overcome these limitations, we introduce GIBOOST, an AI-driven framework that integrates outputs from multiple DRMs using a Bayesian framework and an optimized autoencoder. GIBOOST systematically selects and integrates the two most informative DRMs by evaluating key visualization features, including separability, spatial continuity, uniformity, cellular dynamics, and cluster sensitivity. Rather than prioritizing a single DRM, it identifies the optimal combination that maximizes clustering sensitivity (GI) while preserving biologically relevant spatial and temporal structures. This integration is further refined through a GI-optimized autoencoder, which optimizes the joint distribution of GI, neuron count, and batch size effects to improve visualization quality. We demonstrate GIBOOST's efficacy across multiple dynamic biological processes, including epithelial-mesenchymal transition, CiPSC reprogramming, spermatogenesis, and placental development. Compared to nine individual DRMs, GIBOOST enhances clustering sensitivity and biological relevance by 30%, enabling more accurate interpretation of differentiation trajectories and cell-cell interactions. When applied to a large single-cell RNA-seq dataset (400 000 cells, 28 cell types, seven placental regions), GIBOOST uncovers novel immune-placenta interactions, providing deeper insights into cross-tissue communication during pregnancy. By improving both the visualization and interpretability of high-dimensional data, GIBOOST serves as a powerful tool for computational systems biology, enabling a more accurate exploration of complex cellular systems.

摘要

高维单细胞数据分析对于理解复杂的生物相互作用至关重要,但传统的降维方法(DRM)往往无法同时保留全局和局部结构。现有的DRM,如t分布随机邻域嵌入(t-SNE)、均匀流形逼近与投影(UMAP)、主成分分析(PCA)以及基于亲和力的转移嵌入热扩散势(PHATE),优化了不同的可视化目标,导致在聚类可分离性、空间组织和时间连贯性之间进行权衡。为了克服这些限制,我们引入了GIBOOST,这是一个由人工智能驱动的框架,它使用贝叶斯框架和优化的自动编码器集成多个DRM的输出。GIBOOST通过评估关键的可视化特征,包括可分离性、空间连续性、均匀性、细胞动力学和聚类敏感性,系统地选择并集成两个最具信息性的DRM。它不是优先考虑单个DRM,而是确定在保留生物学相关的空间和时间结构的同时最大化聚类敏感性(GI)的最佳组合。这种集成通过GI优化的自动编码器进一步细化,该自动编码器优化了GI、神经元计数和批次大小效应的联合分布,以提高可视化质量。我们展示了GIBOOST在多个动态生物过程中的功效,包括上皮-间质转化、CiPSC重编程、精子发生和胎盘发育。与九种单独的DRM相比,GIBOOST将聚类敏感性和生物学相关性提高了约30%,能够更准确地解释分化轨迹和细胞间相互作用。当应用于一个大型单细胞RNA测序数据集(约400000个细胞,28种细胞类型,七个胎盘区域)时,GIBOOST揭示了新的免疫-胎盘相互作用,为孕期跨组织通讯提供了更深入的见解。通过提高高维数据的可视化和可解释性,GIBOOST成为计算系统生物学的强大工具,能够更准确地探索复杂的细胞系统。

相似文献

10
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

本文引用的文献

1
Seeing data as t-SNE and UMAP do.如同t-SNE和UMAP那样查看数据。
Nat Methods. 2024 Jun;21(6):930-933. doi: 10.1038/s41592-024-02301-x.
2
How important is EMT for cancer metastasis? EMT 在癌症转移中有多重要?
PLoS Biol. 2024 Feb 7;22(2):e3002487. doi: 10.1371/journal.pbio.3002487. eCollection 2024 Feb.
8
Dimensionality reduction for visualizing high-dimensional biological data.高维生物学数据的可视化降维。
Biosystems. 2022 Oct;220:104749. doi: 10.1016/j.biosystems.2022.104749. Epub 2022 Jul 30.
9
Haisu: Hierarchically supervised nonlinear dimensionality reduction.海苏:分层监督的非线性降维。
PLoS Comput Biol. 2022 Jul 21;18(7):e1010351. doi: 10.1371/journal.pcbi.1010351. eCollection 2022 Jul.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验