Suppr超能文献

主成分分析-均匀流形近似与投影(PCA-UMAP)流形上的扩散:数据结构保留对去噪高维单细胞RNA测序数据的影响。

Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data.

作者信息

Cristian Padron-Manrique, Aarón Vázquez-Jiménez, Armando Esquivel-Hernandez Diego, Estrella Martinez-Lopez Yoscelina, Daniel Neri-Rosario, David Giron-Villalobos, Edgar Mixcoha, Paul Sánchez-Castañeda Jean, Osbaldo Resendis-Antonio

机构信息

Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico.

Programa de Doctorado en Ciencias Biomédicas, Circuito Posgrados, Ciudad Universitaria, Alcaldía Coyoacán Unidad de Posgrado Edificio B primer Piso, Universidad Nacional Autónoma de México (UNAM), Mexico City 04510, Mexico.

出版信息

Biology (Basel). 2024 Jul 9;13(7):512. doi: 10.3390/biology13070512.

Abstract

Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.

摘要

单细胞转录组学(scRNA-seq)正在彻底改变生物学研究,但它面临着诸如转录本捕获效率低下和噪声等挑战。为应对这些挑战,人们使用了诸如邻居平均或图扩散等方法。这些方法通常依赖于来自低维流形的k近邻图。然而,scRNA-seq数据存在“维度诅咒”问题,导致在使用插补方法时数据过度平滑。为克服这一问题,sc-PHENIX采用了一种主成分分析-均匀流形近似投影(PCA-UMAP)扩散方法,该方法增强了数据结构的保留,并允许更精细地使用主成分分析维度和扩散参数(例如,k近邻、马尔可夫矩阵的指数运算),以尽量减少噪声引入。这种方法能够更准确地构建指数化马尔可夫矩阵(细胞邻域图),超越了如MAGIC等方法。通过各种scRNA-seq数据集验证,sc-PHENIX显著减轻了过度平滑问题,证明了其在细胞表型表示方面的改进。应用于多细胞肿瘤球体数据集时,sc-PHENIX识别出了已知的极端表型状态,展示了其有效性。sc-PHENIX是开源的,可供使用和修改。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5828/11274112/a03ffd247bd6/biology-13-00512-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验