Tsai Yiting, Nanthakumar Vikash, Mohammadi Saeed, Baldwin Susan A, Gopaluni Bhushan, Geng Fei
University of British Columbia, 2360 East Mall, Vancouver, BC V6T 1Z3, Canada.
McMaster University, 1280 Main St W, Hamilton, ON L8S 4L8, Canada.
iScience. 2023 Sep 28;26(11):108006. doi: 10.1016/j.isci.2023.108006. eCollection 2023 Nov 17.
Protein biomarkers can be used to characterize symptom classes, which describe the metabolic or immunodeficient state of patients during the progression of a specific disease. Recent literature has shown that machine learning methods can complement traditional clinical methods in identifying biomarkers. However, many machine learning frameworks only apply narrowly to a specific archetype or subset of diseases. In this paper, we propose a feature extractor which can discover protein biomarkers for a wide variety of classification problems. The feature extractor uses a special type of deep learning model, which discovers a latent space that allows for optimal class separation and enhanced class cluster identity. The extracted biomarkers can then be used to train highly accurate supervised learning models. We apply our methods to a dataset involving COVID-19 patients and another involving scleroderma patients, to demonstrate improved class separation and reduced false discovery rates compared to results obtained using traditional models.
蛋白质生物标志物可用于表征症状类别,这些症状类别描述了患者在特定疾病进展过程中的代谢或免疫缺陷状态。最近的文献表明,机器学习方法可以在识别生物标志物方面补充传统临床方法。然而,许多机器学习框架仅狭义地适用于特定的疾病原型或子集。在本文中,我们提出了一种特征提取器,它可以为各种分类问题发现蛋白质生物标志物。该特征提取器使用一种特殊类型的深度学习模型,该模型发现一个潜在空间,该空间允许实现最佳的类别分离并增强类簇识别。然后,提取的生物标志物可用于训练高度准确的监督学习模型。我们将我们的方法应用于一个涉及新冠肺炎患者的数据集和另一个涉及硬皮病患者的数据集,以证明与使用传统模型获得的结果相比,类别分离得到了改善,错误发现率降低了。