Zhang Lin, Lin Yexiang, Wang Kaiyue, Han Lifeng, Zhang Xue, Gao Xiumei, Li Zheng, Zhang Houliang, Zhou Jiashun, Yu Heshui, Fu Xuebin
State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China.
Biomedical Engineering, Imperial College London, London, United Kingdom.
Front Cardiovasc Med. 2023 Jan 11;9:1044443. doi: 10.3389/fcvm.2022.1044443. eCollection 2022.
Machine learning (ML) has gained intensive popularity in various fields, such as disease diagnosis in healthcare. However, it has limitation for single algorithm to explore the diagnosing value of dilated cardiomyopathy (DCM). We aim to develop a novel overall normalized sum weight of multiple-model MLs to assess the diagnosing value in DCM.
Gene expression data were selected from previously published databases (six sets of eligible microarrays, 386 samples) with eligible criteria. Two sets of microarrays were used as training; the others were studied in the testing sets (ratio 5:1). Totally, we identified 20 differently expressed genes (DEGs) between DCM and control individuals (7 upregulated and 13 down-regulated).
We developed six classification ML methods to identify potential candidate genes based on their overall weights. Three genes, serine proteinase inhibitor A3 (), frizzled-related proteins (FRPs) 3 (), and ficolin 3 () were finally identified as the receiver operating characteristic (ROC). Interestingly, we found all three genes correlated considerably with plasma cells. Importantly, not only in training sets but also testing sets, the areas under the curve (AUCs) for , , and were greater than 0.88. The ROC of was significantly high (0.940 in training and 0.918 in testing sets), indicating it is a potentially functional gene in DCM. Especially, the plasma levels in DCM patients of SERPINA3, FCN, and FRZB were significant compared with healthy control.
SERPINA3, FRZB, and FCN3 might be potential diagnosis targets for DCM, Further verification work could be implemented.
机器学习(ML)在各个领域都广受欢迎,比如医疗保健中的疾病诊断。然而,单一算法在探索扩张型心肌病(DCM)的诊断价值方面存在局限性。我们旨在开发一种新颖的多模型机器学习的整体归一化和权重方法,以评估其在DCM中的诊断价值。
从先前发表的数据库中选择符合入选标准的基因表达数据(六组合格的微阵列,386个样本)。两组微阵列用作训练;其他的用于测试集研究(比例为5:1)。我们总共在DCM患者和对照个体之间鉴定出20个差异表达基因(DEG)(7个上调和13个下调)。
我们开发了六种分类机器学习方法,根据其整体权重识别潜在的候选基因。最终确定了三个基因,丝氨酸蛋白酶抑制剂A3()、卷曲相关蛋白(FRPs)3()和纤维胶凝蛋白3()作为受试者操作特征(ROC)。有趣的是,我们发现所有这三个基因都与浆细胞有相当大的相关性。重要的是,不仅在训练集而且在测试集中,、和的曲线下面积(AUC)均大于0.88。的ROC显著较高(训练集中为0.940,测试集中为0.918),表明它是DCM中一个潜在的功能基因。特别是,与健康对照相比,DCM患者中SERPINA3、FCN和FRZB的血浆水平有显著差异。
SERPINA3、FRZB和FCN3可能是DCM的潜在诊断靶点,可开展进一步的验证工作。