Yang Qingxia, Gong Yaguo, Zhu Feng
Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
Anal Chem. 2023 Apr 4;95(13):5542-5552. doi: 10.1021/acs.analchem.2c04402. Epub 2023 Mar 21.
Multiclass metabolomics has been widely applied in clinical practice to understand pathophysiological processes involved in disease progression and diagnostic biomarkers of various disorders. In contrast to the binary problem, the multiclass classification problem is more difficult in terms of obtaining reliable and stable results due to the increase in the complexity of determining exact class decision boundaries. In particular, methods of biomarker discovery and classification have a significant effect on the multiclass model because different methods with significantly varied theories produce conflicting results even for the same dataset. However, a systematic assessment for selecting the most appropriate methods of biomarker discovery and classification for multiclass metabolomics is still lacking. Therefore, a comprehensive assessment is essential to measure the suitability of methods in multiclass classification models from multiple perspectives. In this study, five biomarker discovery methods and nine classification methods were assessed based on four benchmark datasets of multiclass metabolomics. The performance assessment of the biomarker discovery and classification methods was performed using three evaluation criteria: assessment (cluster analysis of sample grouping), assessment (biomarker consistency in multiple subgroups), and assessment (accuracy in the classification model). As a result, 13 combining strategies with superior performance were selected under multiple criteria based on these benchmark datasets. In conclusion, superior strategies that performed consistently well are suggested for the discovery of biomarkers and the construction of a classification model for multiclass metabolomics.
多类代谢组学已在临床实践中广泛应用,以了解疾病进展中涉及的病理生理过程以及各种疾病的诊断生物标志物。与二元问题不同,由于确定精确的类别决策边界的复杂性增加,多类分类问题在获得可靠和稳定的结果方面更加困难。特别是,生物标志物发现和分类方法对多类模型有重大影响,因为即使对于相同的数据集,具有显著不同理论的不同方法也会产生相互矛盾的结果。然而,目前仍缺乏针对多类代谢组学选择最合适的生物标志物发现和分类方法的系统评估。因此,进行全面评估对于从多个角度衡量多类分类模型中方法的适用性至关重要。在本研究中,基于四个多类代谢组学基准数据集,对五种生物标志物发现方法和九种分类方法进行了评估。使用三个评估标准对生物标志物发现和分类方法进行性能评估:评估(样本分组的聚类分析)、评估(多个亚组中的生物标志物一致性)和评估(分类模型中的准确性)。结果,基于这些基准数据集,在多个标准下选择了13种具有优异性能的组合策略。总之,建议采用在生物标志物发现和多类代谢组学分类模型构建方面始终表现出色的卓越策略。