Department of Statistics, National Chengchi University, Taipei, Taiwan, ROC.
Biom J. 2023 Aug;65(6):e2100373. doi: 10.1002/bimj.202100373. Epub 2023 May 9.
Feature screening is a useful and popular tool to detect informative predictors for ultrahigh-dimensional data before developing statistical analysis or constructing statistical models. While a large body of feature screening procedures has been developed, most methods are restricted to examine either continuous or discrete responses. Moreover, even though many model-free feature screening methods have been proposed, additional assumptions are imposed in those methods to ensure their theoretical results. To address those difficulties and provide simple implementation, in this paper we extend the rank-based coefficient of correlation to develop a feature screening procedure. We show that this new screening criterion is able to deal with continuous and binary responses. Theoretically, the sure screening property is established to justify the proposed method. Simulation studies demonstrate that the predictors with nonlinear and oscillatory trajectories are successfully retained regardless of the distribution of the response. Finally, the proposed method is implemented to analyze two microarray datasets.
特征筛选是一种有用且流行的工具,可在开发统计分析或构建统计模型之前,检测超高维数据中的信息预测因子。虽然已经开发出大量的特征筛选程序,但大多数方法仅限于检查连续或离散响应。此外,尽管已经提出了许多无模型的特征筛选方法,但在这些方法中施加了额外的假设,以确保其理论结果。为了解决这些困难并提供简单的实现,本文将基于秩的相关系数扩展到开发特征筛选程序。我们表明,这种新的筛选标准能够处理连续和二进制响应。从理论上讲,建立了可靠的筛选属性来证明所提出的方法。仿真研究表明,无论响应的分布如何,具有非线性和振荡轨迹的预测因子都被成功保留。最后,将所提出的方法应用于分析两个微阵列数据集。