Lannin Timothy B, Thege Fredrik I, Kirby Brian J
Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY, U.S.A.
Department of Biomedical Engineering, Cornell University, Ithaca, NY, U.S.A.
Cytometry A. 2016 Oct;89(10):922-931. doi: 10.1002/cyto.a.22993. Epub 2016 Oct 18.
Advances in rare cell capture technology have made possible the interrogation of circulating tumor cells (CTCs) captured from whole patient blood. However, locating captured cells in the device by manual counting bottlenecks data processing by being tedious (hours per sample) and compromises the results by being inconsistent and prone to user bias. Some recent work has been done to automate the cell location and classification process to address these problems, employing image processing and machine learning (ML) algorithms to locate and classify cells in fluorescent microscope images. However, the type of machine learning method used is a part of the design space that has not been thoroughly explored. Thus, we have trained four ML algorithms on three different datasets. The trained ML algorithms locate and classify thousands of possible cells in a few minutes rather than a few hours, representing an order of magnitude increase in processing speed. Furthermore, some algorithms have a significantly (P < 0.05) higher area under the receiver operating characteristic curve than do other algorithms. Additionally, significant (P < 0.05) losses to performance occur when training on cell lines and testing on CTCs (and vice versa), indicating the need to train on a system that is representative of future unlabeled data. Optimal algorithm selection depends on the peculiarities of the individual dataset, indicating the need of a careful comparison and optimization of algorithms for individual image classification tasks. © 2016 International Society for Advancement of Cytometry.
稀有细胞捕获技术的进步使得从患者全血中捕获循环肿瘤细胞(CTC)并进行分析成为可能。然而,通过人工计数在设备中定位捕获的细胞存在瓶颈,这一过程既繁琐(每个样本需要数小时),又因缺乏一致性且容易出现用户偏差而影响结果。最近已经开展了一些工作来实现细胞定位和分类过程的自动化,以解决这些问题,即采用图像处理和机器学习(ML)算法在荧光显微镜图像中定位和分类细胞。然而,所使用的机器学习方法类型是尚未得到充分探索的设计空间的一部分。因此,我们在三个不同的数据集上训练了四种ML算法。经过训练的ML算法能够在几分钟内而不是几小时内定位和分类数千个可能的细胞,这意味着处理速度提高了一个数量级。此外,一些算法在接收器操作特征曲线下的面积显著(P < 0.05)高于其他算法。此外,在细胞系上进行训练并在CTC上进行测试(反之亦然)时,性能会出现显著(P < 0.05)损失,这表明需要在代表未来未标记数据的系统上进行训练。最佳算法的选择取决于各个数据集的特性,这表明需要针对单个图像分类任务仔细比较和优化算法。© 2016国际细胞计量学促进协会。