Institute for Systems Biology, Seattle, WA 98109, USA.
BMC Bioinformatics. 2012 Sep 11;13:227. doi: 10.1186/1471-2105-13-227.
Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification.
TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naïve Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets.
TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.
相对表达算法,如最高分对(TSP)和最高分三重(TST),具有一些区别于其他分类方法的优势,包括对过拟合的抗性、对大多数数据归一化方法的不变性以及生物学可解释性。最高分 'N'(TSN)算法是其他相对表达算法的广义形式,它使用通用排列和动态分类器大小来控制分类可用的排列和组合空间。
TSN 在九个癌症数据集上进行了测试,显示不同分类器大小(N 的选择)之间的分类准确性存在统计学显著差异。当在 Microarray Quality Control II 数据集上测试时,TSN 也与各种不同的分类方法(包括人工神经网络、分类树、判别分析、k-最近邻、朴素贝叶斯和支持向量机)进行了竞争,表现出色。此外,与其他方法相比,TSN 在训练数据上的过拟合程度较低,这使人们有信心在交叉验证中获得的结果将更普遍适用于外部验证集。
TSN 保留了其他相对表达算法的优势,同时允许探索更大的排列和组合空间,当可用的测量特征数量较少时,可能会提高分类准确性。