Department of Electrical and Computer Engineering, Southern Illinois University, 1230 Lincoln Drive, Carbondale, 62901, IL, USA.
Department of Biostatistics, Indiana University School of Public Health, 410 West 10th Street, Suite 3000, Indianapolis, 46202, IN, USA.
BMC Bioinformatics. 2018 Jun 26;19(1):244. doi: 10.1186/s12859-018-2231-1.
The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes).
We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes.
The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.
基于基因对表达中相对排序反转的概念,提出了 Top Scoring Pair(TSP)分类器,作为一种简单、准确且易于解释的决策规则,用于基因表达谱的分类和类别预测。基因表达排序差异与疾病的存在与否相关的观点令人信服,具有很强的生物学合理性。然而,TSP 公式忽略了可以提高分类准确性的重要可用信息,并且容易选择在两种情况下没有差异表达的基因(“枢轴”基因)。
我们引入了 AUCTSP 分类器作为原始 TSP 中涉及的排序反转幅度的替代基于排序的估计量。所提出的估计量基于接收器操作特征曲线(ROC)下的面积(AUC),因此考虑了所考虑条件下基因对中整个基因表达水平分布的分离,而不是像原始 TSP 公式那样比较单个受试者内的基因排序。通过涉及卵巢癌、白血病、结肠癌、乳腺癌和前列腺癌以及弥漫性大 B 细胞淋巴瘤的分类的广泛模拟和案例研究,我们表明,在所提出的方法中,在提高分类准确性、避免过拟合和减少选择非信息性(枢轴)基因方面具有优越性。
所提出的 AUCTSP 是一种简单但可靠且稳健的基于排序的基因表达分类器。虽然 AUCTSP 的工作原理与 TSP 相同,但它能够根据两个标记基因在所有受试者中的相对排序而不是每个个体受试者来确定最佳评分基因对,从而在分类准确性方面取得显著的性能提升。此外,所提出的方法倾向于避免选择非信息性(枢轴)基因作为最佳评分对的成员。