一项使用增量数据和机器学习技术进行质量控制的研究。

A study on quality control using delta data with machine learning technique.

作者信息

Liang Yufang, Wang Zhe, Huang Dawei, Wang Wei, Feng Xiang, Han Zewen, Song Biao, Wang Qingtao, Zhou Rui

机构信息

Department of Laboratory Medicine, Beijing Chao-yang Hospital, Capital Medical University, Beijing, PR China.

Inner Mongolia Wesure Date Technology Co., Ltd, Inner Mongolia, PR China.

出版信息

Heliyon. 2022 Jul 14;8(8):e09935. doi: 10.1016/j.heliyon.2022.e09935. eCollection 2022 Aug.

DOI:10.1016/j.heliyon.2022.e09935

PMID:35965972

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9363967/

Abstract

BACKGROUND

In the big data era, patient-based real-time quality control (PBRTQC), as an emerging quality control (QC) method, is expanding within the clinical laboratory industry. However, the main issue of current PBRTQC methodology is data stability. Our study is aimed to explore a novel protocol for data stability by combining delta data with machine learning (ML) technique to improve the capacity of QC event detection.

METHODS

A data set of 423,290 laboratory results from Beijing Chao-yang Hospital 2019 patient results were used as a training set (n = 380960, 90%) and internal validation set (n = 42330, 10%). A further 22,460 results from Beijing Long-fu Hospital 2019 patient results were used as a test set. Three-type data (1) Single-type data processed by truncation limits; (2) delta-type data processed by truncation limits and (3)delta-type data processed by Isolated Forest (IF) algorithm were evaluated with accuracy, sensitivity, NPed, etc., and compared with previously published statistical methods.

RESULTS

The optimal model was based on Random Forest (RF) algorithm by using delta-type data processed by IF algorithm. The model had a better accuracy (0.99), sensitivity (0.99) specificity (0.99) and AUC (0.99) with the dependent test set, surpassing the critical bias of PBRTQC by over 50%. For the LYMPH#, HGB, and PLT, the cumulative MNPed of MLQC were reduced by 95.43%, 97.39%, and 97.97% respectively when compared to the best of the PBRTQC.

CONCLUSION

Final results indicate that by integrating an innovative ML algorithm with the overall data processing protocol the detection of QC events is improved.

摘要

背景

在大数据时代，基于患者的实时质量控制（PBRTQC）作为一种新兴的质量控制（QC）方法，正在临床检验行业中不断扩展。然而，当前PBRTQC方法的主要问题是数据稳定性。我们的研究旨在探索一种通过将差值数据与机器学习（ML）技术相结合来提高QC事件检测能力的数据稳定性新方案。

方法

将北京朝阳医院2019年患者的423290条检验结果数据集用作训练集（n = 380960，90%）和内部验证集（n = 42330，10%）。将北京隆福医院2019年患者的另外22460条结果用作测试集。对三种类型的数据进行评估：（1）通过截断限值处理的单一类型数据；（2）通过截断限值处理的差值类型数据；（3）通过孤立森林（IF）算法处理的差值类型数据，评估指标包括准确性、敏感性、NPed等，并与先前发表的统计方法进行比较。

结果

最优模型基于随机森林（RF）算法，使用通过IF算法处理的差值类型数据。该模型在相关测试集中具有更好的准确性（0.99）、敏感性（0.99）、特异性（0.99）和AUC（0.99），超过PBRTQC临界偏倚50%以上。对于淋巴细胞计数（LYMPH#）、血红蛋白（HGB）和血小板计数（PLT），与PBRTQC中最佳方法相比，MLQC的累积MNPed分别降低了95.43%、97.39%和97.97%。