Suppr超能文献

单细胞RNA测序聚类方法的基准测试与参数敏感性分析

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods.

作者信息

Krzak Monika, Raykov Yordan, Boukouvalas Alexis, Cutillo Luisa, Angelini Claudia

机构信息

Institute for Applied Mathematics "Mauro Picone", Naples, Italy.

Department of Mathematics, Aston University, Birmingham, United Kingdom.

出版信息

Front Genet. 2019 Dec 11;10:1253. doi: 10.3389/fgene.2019.01253. eCollection 2019.

Abstract

Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.

摘要

单细胞RNA测序(scRNAseq)是研究细胞异质性的强大工具。最近,已提出了几种基于聚类的方法来识别不同的细胞群体。这些方法基于不同的统计模型,并且在应用聚类算法之前通常需要执行几个额外的步骤,例如预处理或降维。各个步骤通常由特定于方法的参数控制,这使得该方法可以根据用户选择以不同模式应用于相同数据集。这些方法提供的大量可能性可能会让非专业用户望而却步,因为可用的选择并不总是有清晰的文档记录。此外,迄今为止,尚无大型研究调查这些选择在不同实验背景下可能发挥的作用和产生的影响。这项工作旨在深入了解scRNAseq聚类方法的优缺点,并描述为用户提供的可能性范围。特别是,我们通过将几种方法应用于在维度、细胞群体数量或噪声水平方面有所不同的真实和模拟数据集,对它们在不同使用模式和参数设置下进行了广泛评估。值得注意的是,此处呈现的结果表明,模型性能的巨大差异很大程度上归因于用户特定参数设置的选择。我们描述了归因于其使用模式和不同类型数据集的性能方面的几种趋势,并确定了哪些方法在计算时间方面受数据维度的影响较大。最后,我们强调了scRNAseq数据聚类中的一些开放挑战,例如与簇数量识别相关的挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6164/6918801/1e2963bdeed7/fgene-10-01253-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验