Department of Clinical Sciences, Division of Oncology, Lund University, 22381 Lund, Sweden.
Bioinformatics. 2021 Sep 29;37(18):3043-3044. doi: 10.1093/bioinformatics/btab088.
k-Top Scoring Pairs (kTSP) algorithms utilize in-sample gene expression feature pair rules for class prediction, and have demonstrated excellent performance and robustness. The available packages and tools primarily focus on binary prediction (i.e. two classes). However, many real-world classification problems e.g. tumor subtype prediction, are multiclass tasks.
Here, we present multiclassPairs, an R package to train pair-based single sample classifiers for multiclass problems. multiclassPairs offers two main methods to build multiclass prediction models, either using a one-versus-rest kTSP scheme or through a novel pair-based Random Forest approach. The package also provides options for dealing with class imbalances, multiplatform training, missing features in test data and visualization of training and test results.
'multiclassPairs' package is available on CRAN servers and GitHub: https://github.com/NourMarzouka/multiclassPairs.
Supplementary data are available at Bioinformatics online.
k-最佳配对(kTSP)算法利用样本内基因表达特征对规则进行分类预测,已证明具有出色的性能和稳健性。现有的软件包和工具主要侧重于二分类预测(即两类)。然而,许多现实世界中的分类问题,例如肿瘤亚型预测,都是多类任务。
这里,我们提出了 multiclassPairs,这是一个用于多类问题的基于配对的单样本分类器的 R 包。multiclassPairs 提供了两种主要的方法来构建多类预测模型,一种是使用一对一 kTSP 方案,另一种是通过新颖的基于配对的随机森林方法。该软件包还提供了处理类不平衡、多平台训练、测试数据中缺失特征以及可视化训练和测试结果的选项。
multiclassPairs 软件包可在 CRAN 服务器和 GitHub 上获得:https://github.com/NourMarzouka/multiclassPairs。
补充数据可在生物信息学在线获得。