IEEE J Biomed Health Inform. 2019 Sep;23(5):2091-2098. doi: 10.1109/JBHI.2018.2878878. Epub 2018 Oct 31.
Recent advances in ultra-high-throughput microscopy have enabled a new generation of cell classification methodologies using image-based cell phenotypes alone. In contrast to current single-cell analysis techniques that rely solely on slow and costly genetic/epigenetic analysis, these image-based analyses allow morphological profiling and screening of thousands or even millions of single cells at a fraction of the cost, and have been proven to demonstrate the statistical significance required for understanding the role of cell heterogeneity in diverse biological applications, ranging from cancer screening to drug candidate identification/validation processes. This paper examines the efficacies and opportunities presented by machine learning algorithms in processing large scale datasets with millions of label-free cell images. An automatic single-cell classification framework using convolutional neural network (CNN) has been developed. A comparative analysis of its efficiency in classifying large datasets against conventional k-nearest neighbors (kNN) and support vector machine (SVM) based methods are also presented. Experiments have shown that our proposed framework can efficiently identify multiple types cells with over 99% accuracy based on the phenotypic label-free bright-field images; and CNN-based models perform well and relatively stable against data volume compared with kNN and SVM.
近年来,超高通量显微镜技术的发展使得仅使用基于图像的细胞表型进行新一代细胞分类方法成为可能。与目前仅依赖于缓慢且昂贵的遗传/表观遗传分析的单细胞分析技术相比,这些基于图像的分析允许对数千个甚至数百万个单细胞进行形态分析和筛选,成本仅为其一小部分,并且已被证明能够证明在各种生物学应用中理解细胞异质性的作用所需的统计学意义,从癌症筛查到药物候选物的鉴定/验证过程。本文研究了机器学习算法在处理具有数百万个无标签细胞图像的大规模数据集方面的功效和机遇。已经开发了一种使用卷积神经网络 (CNN) 的自动单细胞分类框架。还对其在对传统 k-最近邻 (kNN) 和支持向量机 (SVM) 方法的分类大型数据集的效率进行了比较分析。实验表明,我们提出的基于 CNN 的框架可以根据无标签明场图像的表型以超过 99%的准确率有效地识别多种类型的细胞;与 kNN 和 SVM 相比,基于 CNN 的模型在数据量方面表现良好且相对稳定。