Suppr超能文献

用于单细胞基因组数据去噪的加权k近邻法和基于扩散方法的优化调整

Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data.

作者信息

Tjärnberg Andreas, Mahmood Omar, Jackson Christopher A, Saldi Giuseppe-Antonio, Cho Kyunghyun, Christiaen Lionel A, Bonneau Richard A

机构信息

Center for Developmental Genetics, New York University, New York, New York, USA.

Center For Genomics and Systems Biology, NYU, New York, New York, USA.

出版信息

PLoS Comput Biol. 2021 Jan 7;17(1):e1008569. doi: 10.1371/journal.pcbi.1008569. eCollection 2021 Jan.

Abstract

The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.

摘要

单细胞基因组学数据的分析面临着若干统计挑战,人们已付出大量努力来开发用于分析此类数据的方法,这些方法可填补缺失值、解决抽样问题并对噪声进行量化和校正。尽管做出了这些努力,但尚未就最佳实践达成共识,并且所有当前方法都因可用数据和实证检验的不同而存在很大差异。k近邻图(kNN-G)常被用于推断细胞的身份及其之间的关系,并且是许多广泛使用的降维和投影方法的基础。kNN-G也是使用例如邻居平均和图扩散的插补方法的基础。然而,由于缺乏用于选择超参数的公认最优目标函数,这些方法往往会过度平滑数据,从而导致在细胞身份以及调节机制背后的特定基因-基因模式方面的信息丢失。在本文中,我们使用一种新颖的非随机方法来研究基于kNN和扩散的去噪方法的调优,以在单细胞数据中最佳地保留生物学相关的信息方差。该框架,即加权亲和核与自监督去噪表达数据(DEWÄKSS),使用自监督技术来调整其参数。我们证明,使用我们的目标函数选择的最优参数进行去噪(i)对于使用来自既定基准数据的预处理方法具有鲁棒性,(ii)在降维方法中能够区分细胞身份并维持稳健的聚类,(iii)与先前倾向于过度平滑数据方差的基于启发式的方法不同,它能在多个表达维度上维持方差,并且(iv)很少涉及扩散,而是使用固定的加权kNN图进行去噪。总之,这些发现为基于kNN和扩散的去噪方法提供了新的理解。DEWÄKSS的代码和示例数据可在https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dba/7817019/2e47b3591e02/pcbi.1008569.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验