Suppr超能文献

使用FlowGrid对单细胞流式细胞术数据进行超快速聚类。

Ultrafast clustering of single-cell flow cytometry data using FlowGrid.

作者信息

Ye Xiaoxin, Ho Joshua W K

机构信息

Victor Chang Cardiac Research Institute, Sydney, Australia.

University of New South Wales, Sydney, Australia.

出版信息

BMC Syst Biol. 2019 Apr 5;13(Suppl 2):35. doi: 10.1186/s12918-019-0690-2.

Abstract

BACKGROUND

Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells.

RESULTS

Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error.

CONCLUSIONS

FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid .

摘要

背景

流式细胞术是一种用于细胞表面标志物定量单细胞分析的常用技术。它能够对数以百万计的单细胞中的数十种细胞表面蛋白标志物进行表达测量。它是发现细胞亚群和量化细胞群体异质性的强大工具。传统上,科学家使用手动设门来识别细胞类型,但该过程具有主观性,并且对于大型多维数据无效。已经开发了许多聚类算法来分析这些数据,但其中大多数对于超过一千万个细胞的非常大的数据集不可扩展。

结果

在此,我们提出了一种新的聚类算法,该算法结合了基于密度的聚类算法DBSCAN的优点和基于网格的聚类的可扩展性。这种新的聚类算法在Python中作为开源包FlowGrid实现。FlowGrid内存效率高,并且相对于细胞数量呈线性扩展。我们已将FlowGrid的性能与其他最先进的聚类程序进行了评估,发现FlowGrid产生的聚类结果相似,但所需时间大大减少。例如,FlowGrid能够在不到12秒的时间内完成对2360万个细胞的数据集的聚类任务,而其他算法则需要超过500秒或出现错误。

结论

FlowGrid是一种用于大型单细胞流式细胞术数据的超快速聚类算法。源代码可在https://github.com/VCCRI/FlowGrid获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe2/6449887/7393cc5c2fae/12918_2019_690_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验