From the Hematology Department, Synlab Global Diagnostics, Barcelona, Spain (Bigorra, Larriba).
The Department of Experimental & Health Sciences, Pompeu Fabra University, Barcelona, Spain (Bigorra, Gutiérrez-Gallego).
Arch Pathol Lab Med. 2022 Aug 1;146(8):1024-1031. doi: 10.5858/arpa.2021-0044-OA.
CONTEXT.—: The goal of the lymphocytosis diagnosis approach is its classification into benign or neoplastic categories. Nevertheless, a nonnegligible percentage of laboratories fail in that classification.
OBJECTIVE.—: To design and develop a machine learning model by using objective data from the DxH 800 analyzer, including cell population data, leukocyte and absolute lymphoid counts, hemoglobin concentration, and platelet counts, besides age and sex, with classification purposes for lymphocytosis diagnosis.
DESIGN.—: A total of 1565 samples were included from 10 different lymphoid categories grouped into 4 diagnostic categories: normal controls (458), benign causes of lymphocytosis (567), neoplastic lymphocytosis (399), and spurious causes of lymphocytosis (141). The data set was distributed in a 60-20-20 scheme for training, testing, and validation stages. Six machine learning models were built and compared, and the selection of the final model was based on the minimum generalization error and 10-fold cross validation accuracy.
RESULTS.—: The selected neural network classifier rendered a global 10-class classification validation accuracy corresponding to 89.9%, which, considering the aforementioned 4 diagnostic categories, presented a diagnostic impact accuracy corresponding to 95.8%. Finally, a prospective proof of concept was performed with 100 new cases with a global diagnostic accuracy corresponding to 91%.
CONCLUSIONS.—: The proposed machine learning model was feasible, with a high benefit-cost ratio, as the results were obtained within the complete blood count with differential. Finally, the diagnostic impact with high accuracies in both model validation and proof of concept encourages exploration of the model for real-world application on a daily basis.
淋巴细胞增多症诊断方法的目的是将其分类为良性或肿瘤性类别。然而,相当一部分实验室未能进行分类。
设计和开发一个机器学习模型,使用 DxH 800 分析仪的客观数据,包括细胞群数据、白细胞和绝对淋巴细胞计数、血红蛋白浓度和血小板计数,以及年龄和性别,用于淋巴细胞增多症的分类诊断。
共纳入了来自 10 种不同淋巴细胞群的 1565 个样本,分为 4 个诊断类别:正常对照组(458)、良性淋巴细胞增多症(567)、肿瘤性淋巴细胞增多症(399)和假性淋巴细胞增多症(141)。数据集在训练、测试和验证阶段以 60-20-20 的方案分布。建立并比较了 6 个机器学习模型,最终模型的选择基于最小泛化误差和 10 倍交叉验证准确性。
选择的神经网络分类器对 10 类整体分类验证准确性为 89.9%,考虑到上述 4 个诊断类别,其诊断影响准确性为 95.8%。最后,对 100 例新病例进行了前瞻性概念验证,总体诊断准确率为 91%。
所提出的机器学习模型是可行的,具有较高的成本效益比,因为结果是在全血细胞计数和分类中获得的。最后,模型验证和概念验证的高准确性具有较高的诊断影响,鼓励探索该模型在日常工作中的实际应用。