Tsinghua University, Beijing National Research Center for Information Science and Technology, Depart, China.
J Biomed Opt. 2020 Jun;25(6):1-12. doi: 10.1117/1.JBO.25.6.066001.
The use of optofluidic time-stretch flow cytometry enables extreme-throughput cell imaging but suffers from the difficulties of capturing and processing a large amount of data. As significant amounts of continuous image data are generated, the images require identification with high speed.
We present an intelligent cell phenotyping framework for high-throughput optofluidic time-stretch microscopy based on the XGBoost algorithm, which is able to classify obtained cell images rapidly and accurately. The applied image recognition consists of density-based spatial clustering of applications with noise outlier detection, histograms of oriented gradients combining gray histogram fused feature, and XGBoost classification.
We tested the ability of this framework against other previously proposed or commonly used algorithms to phenotype two groups of cell images. We quantified their performances with measures of classification ability and computational complexity based on AUC and test runtime. The tested cell image datasets were acquired from high-throughput imaging of over 20,000 drug-treated and untreated cells with an optofluidic time-stretch microscope.
The framework we built beats other methods with an accuracy of over 97% and a classification frequency of 3000 cells / s. In addition, we determined the optimal structure of training sets according to model performances under different training set components.
The proposed XGBoost-based framework acts as a promising solution to processing large flow image data. This work provides a foundation for future cell sorting and clinical practice of high-throughput imaging cytometers.
使用光流控时拉伸流式细胞术可以实现极高的通量细胞成像,但存在捕获和处理大量数据的困难。随着连续图像数据的大量产生,这些图像需要高速识别。
我们提出了一种基于 XGBoost 算法的高通量光流控时拉伸显微镜智能细胞表型分析框架,能够快速准确地对获得的细胞图像进行分类。所应用的图像识别包括基于密度的带有噪声离群点检测的应用空间聚类、结合灰度直方图融合特征的方向梯度直方图以及 XGBoost 分类。
我们用此框架对其他先前提出的或常用的算法进行了测试,以对两组细胞图像进行表型分析。我们根据 AUC 和测试运行时间等分类能力和计算复杂度的度量来量化它们的性能。所测试的细胞图像数据集是通过光流控时拉伸显微镜对超过 20000 个经药物处理和未经处理的细胞进行高通量成像获得的。
我们构建的框架在准确性超过 97%和分类频率 3000 个细胞/秒的情况下优于其他方法。此外,我们根据不同训练集成分下模型性能确定了训练集的最佳结构。
所提出的基于 XGBoost 的框架是处理大量流式图像数据的有前途的解决方案。这项工作为高通量成像细胞仪的未来细胞分选和临床实践提供了基础。