Dasariraju Satvik, Huo Marc, McCalla Serena
iResearch Institute, Glen Cove, NY 11542, USA.
The Lawrenceville School, Lawrenceville, NJ 08648, USA.
Bioengineering (Basel). 2020 Oct 1;7(4):120. doi: 10.3390/bioengineering7040120.
Acute myeloid leukemia (AML) is a fatal blood cancer that progresses rapidly and hinders the function of blood cells and the immune system. The current AML diagnostic method, a manual examination of the peripheral blood smear, is time consuming, labor intensive, and suffers from considerable inter-observer variation. Herein, a machine learning model to detect and classify immature leukocytes for efficient diagnosis of AML is presented. Images of leukocytes in AML patients and healthy controls were obtained from a publicly available dataset in The Cancer Imaging Archive. Image format conversion, multi-Otsu thresholding, and morphological operations were used for segmentation of the nucleus and cytoplasm. From each image, 16 features were extracted, two of which are new nucleus color features proposed in this study. A random forest algorithm was trained for the detection and classification of immature leukocytes. The model achieved 92.99% accuracy for detection and 93.45% accuracy for classification of immature leukocytes into four types. Precision values for each class were above 65%, which is an improvement on the current state of art. Based on Gini importance, the nucleus to cytoplasm area ratio was a discriminative feature for both detection and classification, while the two proposed features were shown to be significant for classification. The proposed model can be used as a support tool for the diagnosis of AML, and the features calculated to be most important serve as a baseline for future research.
急性髓系白血病(AML)是一种致命的血液癌症,病情发展迅速,会阻碍血细胞和免疫系统的功能。当前的AML诊断方法是对外周血涂片进行人工检查,这种方法既耗时又费力,而且观察者之间存在相当大的差异。在此,我们提出了一种用于检测和分类未成熟白细胞以有效诊断AML的机器学习模型。AML患者和健康对照者的白细胞图像取自癌症影像存档库中的一个公开数据集。通过图像格式转换、多阈值大津法和形态学操作对细胞核和细胞质进行分割。从每张图像中提取了16个特征,其中两个是本研究中提出的新的细胞核颜色特征。使用随机森林算法对未成熟白细胞进行检测和分类。该模型对未成熟白细胞的检测准确率达到92.99%,将未成熟白细胞分为四种类型的分类准确率达到93.45%。每个类别的精确率值均高于65%,这是对当前技术水平的一种改进。基于基尼重要性,细胞核与细胞质面积比是检测和分类的判别特征,而提出的两个特征对分类具有重要意义。所提出的模型可作为AML诊断的辅助工具,计算得出的最重要特征可作为未来研究的基线。