McGaughey Georgia, Walters W Patrick, Goldman Brian
Modeling & Informatics, Vertex Pharmaceuticals, Boston, MA, USA.
F1000Res. 2016 Apr 7;5. doi: 10.12688/f1000research.8317.3. eCollection 2016.
Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN's performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets.
三种不同的方法(逻辑回归、协变量偏移和k近邻)被应用于五个内部数据集和一个存在协变量偏移的外部公开可用数据集。在所有情况下,k近邻的性能均不如逻辑回归或协变量偏移。令人惊讶的是,在所研究的数据集中,使用协变量偏移对训练数据进行重新加权并没有明显优势。