Suppr超能文献

使用机器学习从百万级全血细胞计数预测新冠病毒2型阳性

Prediction of SARS-CoV-2-positivity from million-scale complete blood counts using machine learning.

作者信息

Zuin Gianlucca, Araujo Daniella, Ribeiro Vinicius, Seiler Maria Gabriella, Prieto Wesley Heleno, Pintão Maria Carolina, Dos Santos Lazari Carolina, Granato Celso Francisco Hernandes, Veloso Adriano

机构信息

Universidade Federal de Minas Gerais, CS Dept., Belo Horizonte, Brazil.

Kunumi, Belo Horizonte, Brazil.

出版信息

Commun Med (Lond). 2022 Jun 15;2:72. doi: 10.1038/s43856-022-00129-0. eCollection 2022.

Abstract

BACKGROUND

The Complete Blood Count (CBC) is a commonly used low-cost test that measures white blood cells, red blood cells, and platelets in a person's blood. It is a useful tool to support medical decisions, as intrinsic variations of each analyte bring relevant insights regarding potential diseases. In this study, we aimed at developing machine learning models for COVID-19 diagnosis through CBCs, unlocking the predictive power of non-linear relationships between multiple blood analytes.

METHODS

We collected 809,254 CBCs and 1,088,385 RT-PCR tests for SARS-Cov-2, of which 21% (234,466) were positive, from 900,220 unique individuals. To properly screen COVID-19, we also collected 120,807 CBCs of 16,940 individuals who tested positive for other respiratory viruses. We proposed an ensemble procedure that combines machine learning models for different respiratory infections and analyzed the results in both the first and second waves of COVID-19 cases in Brazil.

RESULTS

We obtain a high-performance AUROC of 90 + % for validations in both scenarios. We show that models built solely of SARS-Cov-2 data are biased, performing poorly in the presence of infections due to other RNA respiratory viruses.

CONCLUSIONS

We demonstrate the potential of a novel machine learning approach for COVID-19 diagnosis based on a CBC and show that aggregating information about other respiratory diseases was essential to guarantee robustness in the results. Given its versatile nature, low cost, and speed, we believe that our tool can be particularly useful in a variety of scenarios-both during the pandemic and after.

摘要

背景

全血细胞计数(CBC)是一种常用的低成本检测方法,用于测量人体血液中的白细胞、红细胞和血小板。它是辅助医疗决策的有用工具,因为每种分析物的内在变化能为潜在疾病提供相关见解。在本研究中,我们旨在通过全血细胞计数开发用于COVID-19诊断的机器学习模型,挖掘多种血液分析物之间非线性关系的预测能力。

方法

我们收集了来自900,220名独特个体的809,254份全血细胞计数数据和1,088,385份针对SARS-CoV-2的逆转录聚合酶链反应(RT-PCR)检测数据,其中21%(234,466)呈阳性。为了准确筛查COVID-19,我们还收集了16,940名检测出其他呼吸道病毒呈阳性的个体的120,807份全血细胞计数数据。我们提出了一种整合程序,将针对不同呼吸道感染的机器学习模型结合起来,并分析了巴西COVID-19病例第一波和第二波的结果。

结果

在两种情况下进行验证时,我们都获得了高于90%的高性能曲线下面积(AUROC)。我们表明,仅基于SARS-CoV-2数据构建的模型存在偏差,在存在其他RNA呼吸道病毒感染的情况下表现不佳。

结论

我们展示了一种基于全血细胞计数的新型机器学习方法用于COVID-19诊断的潜力,并表明汇总有关其他呼吸道疾病的信息对于保证结果的稳健性至关重要。鉴于其通用性、低成本和快速性,我们相信我们的工具在大流行期间及之后的各种情况下都可能特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42a7/9200753/d71a50c3eaf5/43856_2022_129_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验