Guzmán Rivera Jeisac, Zheng Haiyan, Richlin Benjamin, Suarez Christian, Gaur Sunanda, Ricciardi Elizabeth, Hasan Uzma N, Cuddy William, Singh Aalok R, Bukulmez Hulya, Kaelber David C, Kimura Yukiko, Brady Patrick W, Wahezi Dawn, Rothschild Evin, Lakhani Saquib A, Herbst Katherine W, Hogan Alexander H, Salazar Juan C, Moroso-Fela Sandra, Roy Jason, Kleinman Lawrence C, Horton Daniel B, Moore Dirk F, Gennaro Maria Laura
Public Health Research Institute, Rutgers New Jersey Medical School, Rutgers Biomedical and Health Sciences, Newark, NJ.
Center for Advanced Biotechnology and Medicine.
medRxiv. 2025 Apr 25:2025.04.17.25325767. doi: 10.1101/2025.04.17.25325767.
We demonstrate an approach that integrates biomarker analysis with machine learning to identify protein signatures, using the example of SARS-CoV-2-induced Multisystem Inflammatory Syndrome in Children (MIS-C).
We used plasma samples collected from subjects diagnosed with MIS-C and compared them first to controls with asymptomatic/mild SARS-CoV-2 infection and then to controls with pneumonia or Kawasaki disease. We used mass spectrometry to identify proteins. Support vector machine (SVM) algorithm-based classification schemes were used to analyze protein pathways. We assessed diagnostic accuracy using internal and external cross-validation.
Proteomic analysis of a training dataset containing MIS-C (N=17), and asymptomatic/mild SARS-CoV-2 infected control samples (N=20) identified 643 proteins, of which 101 were differentially expressed. Plasma proteins associated with inflammation and coagulation increased and those associated with lipid metabolism decreased in MIS-C relative to controls. The SVM machine learning algorithm identified a three-protein model (ORM1, AZGP1, SERPINA3) that achieved 90.0% specificity, 88.2% sensitivity, and 93.5% area under the curve (AUC) distinguishing MIS-C from controls in the training set. Performance was retained in the validation dataset utilizing MIS-C (N=17) and asymptomatic/mild SARS-CoV-2 infected control samples (N=10) (90.0% specificity, 84.2% sensitivity, 87.4% AUC). We next replicated our approach to compare MIS-C with similarly presenting syndromes, such as pneumonia (N=17) and Kawasaki Disease (N=13) and found a distinct three-protein signature (VWF, SERPINA3, and FCGBP) that accurately distinguished MIS-C from the other conditions (97.5% specificity, 89.5% sensitivity, 95.6% AUC). We also developed a software tool that may be used to evaluate other protein pathway signatures using our data.
We used MIS-C, a novel hyperinflammatory illness, to demonstrate that the use of mass spectrometry to identify candidate plasma proteins followed by machine learning, specifically SVM, is an efficient strategy for identifying and evaluating biomarker signatures for disease classification.
我们以儿童新冠病毒感染相关多系统炎症综合征(MIS-C)为例,展示一种将生物标志物分析与机器学习相结合以识别蛋白质特征的方法。
我们使用了从被诊断为MIS-C的受试者身上采集的血浆样本,首先将其与无症状/轻度新冠病毒感染的对照组进行比较,然后与肺炎或川崎病的对照组进行比较。我们使用质谱法来识别蛋白质。基于支持向量机(SVM)算法的分类方案用于分析蛋白质通路。我们使用内部和外部交叉验证来评估诊断准确性。
对一个包含MIS-C(N = 17)和无症状/轻度新冠病毒感染对照样本(N = 20)的训练数据集进行蛋白质组学分析,共鉴定出643种蛋白质,其中101种存在差异表达。与对照组相比,MIS-C中与炎症和凝血相关的血浆蛋白增加,而与脂质代谢相关的血浆蛋白减少。支持向量机机器学习算法识别出一个由三种蛋白质组成的模型(ORM1、AZGP1、SERPINA3),在训练集中区分MIS-C与对照组时,其特异性达到90.0%,敏感性达到88.2%,曲线下面积(AUC)达到93.5%。在使用MIS-C(N = 17)和无症状/轻度新冠病毒感染对照样本(N = 10)的验证数据集中,该模型的性能得以保持(特异性90.0%,敏感性84.2%,AUC 87.4%)。接下来,我们重复该方法,将MIS-C与表现相似的综合征进行比较,如肺炎(N = 17)和川崎病(N = 13),发现了一个独特的由三种蛋白质组成的特征(VWF、SERPINA3和FCGBP),能够准确区分MIS-C与其他病症(特异性97.5%,敏感性89.5%,AUC 95.6%)。我们还开发了一种软件工具,可用于利用我们的数据评估其他蛋白质通路特征。
我们以MIS-C这种新型的过度炎症性疾病为例,证明了使用质谱法识别候选血浆蛋白,随后结合机器学习,特别是支持向量机,是一种识别和评估用于疾病分类的生物标志物特征的有效策略。