Matysek Adrian, Studnicka Aneta, Smith Wade Menpes, Hutny Michał, Gajewski Paweł, Filipiak Krzysztof J, Goh Jorming, Yang Guang
Immunidex Ltd., London, United Kingdom.
Cognescence Ltd., London, United Kingdom.
Front Med (Lausanne). 2022 Aug 1;9:962101. doi: 10.3389/fmed.2022.962101. eCollection 2022.
Since the outbreak of COVID-19 pandemic the interindividual variability in the course of the disease has been reported, indicating a wide range of factors influencing it. Factors which were the most often associated with increased COVID-19 severity include higher age, obesity and diabetes. The influence of cytokine storm is complex, reflecting the complexity of the immunological processes triggered by SARS-CoV-2 infection. A modern challenge such as a worldwide pandemic requires modern solutions, which in this case is harnessing the machine learning for the purpose of analysing the differences in the clinical properties of the populations affected by the disease, followed by grading its significance, consequently leading to creation of tool applicable for assessing the individual risk of SARS-CoV-2 infection.
Biochemical and morphological parameters values of 5,000 patients (Curisin Healthcare (India) were gathered and used for calculation of eGFR, SII index and N/L ratio. Spearman's rank correlation coefficient formula was used for assessment of correlations between each of the features in the population and the presence of the SARS-CoV-2 infection. Feature importance was evaluated by fitting a Random Forest machine learning model to the data and examining their predictive value. Its accuracy was measured as the F1 Score.
The parameters which showed the highest correlation coefficient were age, random serum glucose, serum urea, gender and serum cholesterol, whereas the highest inverse correlation coefficient was assessed for alanine transaminase, red blood cells count and serum creatinine. The accuracy of created model for differentiating positive from negative SARS-CoV-2 cases was 97%. Features of highest importance were age, alanine transaminase, random serum glucose and red blood cells count.
The current analysis indicates a number of parameters available for a routine screening in clinical setting. It also presents a tool created on the basis of these parameters, useful for assessing the individual risk of developing COVID-19 in patients. The limitation of the study is the demographic specificity of the studied population, which might restrict its general applicability.
自新冠疫情爆发以来,已有关于疾病进程中个体差异的报道,这表明有多种因素会对其产生影响。与新冠病情加重最常相关的因素包括高龄、肥胖和糖尿病。细胞因子风暴的影响较为复杂,反映了由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染引发的免疫过程的复杂性。像全球大流行这样的现代挑战需要现代解决方案,在这种情况下,就是利用机器学习来分析受该疾病影响人群的临床特征差异,然后对其重要性进行分级,从而创建适用于评估个体感染SARS-CoV-2风险的工具。
收集了5000名患者(印度库里辛医疗保健公司)的生化和形态学参数值,并用于计算估算肾小球滤过率(eGFR)、全身免疫炎症指数(SII)和中性粒细胞与淋巴细胞比值(N/L)。采用斯皮尔曼等级相关系数公式评估人群中每个特征与SARS-CoV-2感染存在情况之间的相关性。通过将随机森林机器学习模型拟合到数据并检查其预测价值来评估特征重要性。其准确性以F1分数衡量。
显示最高相关系数的参数是年龄、随机血糖、血清尿素、性别和血清胆固醇,而丙氨酸转氨酶、红细胞计数和血清肌酐的相关系数为最高的负相关。所创建的区分SARS-CoV-2阳性和阴性病例模型的准确性为97%。最重要的特征是年龄、丙氨酸转氨酶、随机血糖和红细胞计数。
当前分析表明有许多参数可用于临床环境中的常规筛查。它还展示了基于这些参数创建的一种工具,可用于评估患者感染新冠的个体风险。该研究的局限性在于所研究人群的人口统计学特异性,这可能会限制其普遍适用性。