Center for Bioinformatics, Saarland University, Saarland Informatics Campus (E2.1), 66123, Saarbrücken, Saarland, Germany.
Sci Rep. 2022 Aug 5;12(1):13458. doi: 10.1038/s41598-022-17609-x.
Machine learning methods trained on cancer cell line panels are intensively studied for the prediction of optimal anti-cancer therapies. While classification approaches distinguish effective from ineffective drugs, regression approaches aim to quantify the degree of drug effectiveness. However, the high specificity of most anti-cancer drugs induces a skewed distribution of drug response values in favor of the more drug-resistant cell lines, negatively affecting the classification performance (class imbalance) and regression performance (regression imbalance) for the sensitive cell lines. Here, we present a novel approach called SimultAneoUs Regression and classificatiON Random Forests (SAURON-RF) based on the idea of performing a joint regression and classification analysis. We demonstrate that SAURON-RF improves the classification and regression performance for the sensitive cell lines at the expense of a moderate loss for the resistant ones. Furthermore, our results show that simultaneous classification and regression can be superior to regression or classification alone.
基于癌细胞系面板训练的机器学习方法被广泛研究,用于预测最佳的抗癌疗法。分类方法旨在区分有效和无效药物,而回归方法则旨在量化药物有效性的程度。然而,大多数抗癌药物的高特异性导致药物反应值呈偏态分布,有利于更耐药的细胞系,这对敏感细胞系的分类性能(类不平衡)和回归性能(回归不平衡)产生负面影响。在这里,我们提出了一种称为 SimultAneoUs Regression and classificatiON Random Forests (SAURON-RF) 的新方法,该方法基于联合进行回归和分类分析的思想。我们证明,SAURON-RF 可以提高敏感细胞系的分类和回归性能,而对耐药细胞系的影响适中。此外,我们的结果表明,同时进行分类和回归可以优于单独进行回归或分类。