Ma Ziyang, Ahn Jeongyoun
Department of Statistics, University of Georgia, Athens, GA 30602, USA.
Department of Industrial and Systems Engineering, KAIST, 34141, South Korea.
Bioinformatics. 2021 Oct 11;37(19):3270-3276. doi: 10.1093/bioinformatics/btab320.
Ordinal classification problems arise in a variety of real-world applications, in which samples need to be classified into categories with a natural ordering. An example of classifying high-dimensional ordinal data is to use gene expressions to predict the ordinal drug response, which has been increasingly studied in pharmacogenetics. Classical ordinal classification methods are typically not able to tackle high-dimensional data and standard high-dimensional classification methods discard the ordering information among the classes. Existing work of high-dimensional ordinal classification approaches usually assume a linear ordinality among the classes. We argue that manually labeled ordinal classes may not be linearly arranged in the data space, especially in high-dimensional complex problems.
We propose a new approach that can project high-dimensional data into a lower discriminating subspace, where the innate ordinal structure of the classes is uncovered. The proposed method weights the features based on their rank correlations with the class labels and incorporates the weights into the framework of linear discriminant analysis. We apply the method to predict the response to two types of drugs for patients with multiple myeloma, respectively. A comparative analysis with both ordinal and nominal existing methods demonstrates that the proposed method can achieve a competitive predictive performance while honoring the intrinsic ordinal structure of the classes. We provide interpretations on the genes that are selected by the proposed approach to understand their drug-specific response mechanisms.
The data underlying this article are available in the Gene Expression Omnibus Database at https://www.ncbi.nlm.nih.gov/geo/ and can be accessed with accession number GSE9782 and GSE68871. The source code for FWOC can be accessed at https://github.com/pisuduo/Feature-Weighted-Ordinal-Classification-FWOC.
Supplementary data are available at Bioinformatics online.
序数分类问题出现在各种实际应用中,其中样本需要被分类到具有自然顺序的类别中。对高维序数数据进行分类的一个例子是使用基因表达来预测序数药物反应,这在药物遗传学中受到越来越多的研究。经典的序数分类方法通常无法处理高维数据,而标准的高维分类方法会丢弃类别之间的顺序信息。现有的高维序数分类方法通常假设类别之间存在线性顺序关系。我们认为,人工标记的序数类别在数据空间中可能不是线性排列的,尤其是在高维复杂问题中。
我们提出了一种新方法,该方法可以将高维数据投影到一个较低维度的判别子空间中,在这个子空间中可以揭示类别的固有顺序结构。所提出的方法基于特征与类别标签的秩相关性对特征进行加权,并将权重纳入线性判别分析框架。我们将该方法分别应用于预测多发性骨髓瘤患者对两种药物的反应。与现有的序数和名义方法进行的比较分析表明,所提出的方法在尊重类别的内在顺序结构的同时,可以实现具有竞争力的预测性能。我们对所提出的方法选择的基因进行了解释,以了解它们的药物特异性反应机制。
本文所依据的数据可在基因表达综合数据库(Gene Expression Omnibus Database)中获取,网址为https://www.ncbi.nlm.nih.gov/geo/,登录号为GSE9782和GSE68871。FWOC的源代码可在https://github.com/pisuduo/Feature-Weighted-Ordinal-Classification-FWOC获取。
补充数据可在《生物信息学》(Bioinformatics)在线获取。