Tian Suyan, Wang Chi
Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China.
Center for Applied Statistical Research, School of Mathematics, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China.
Biomed Res Int. 2019 Mar 19;2019:1724898. doi: 10.1155/2019/1724898. eCollection 2019.
With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene's expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data.
随着高通量技术的迅速发展,时间序列/纵向高通量实验已变得可行且经济实惠。然而,处理跨时间点基因表达谱的统计方法的发展未能跟上此类数据的爆炸式增长。特征选择过程对于纵向微阵列数据至关重要。在本研究中,我们提出使用符号平均法将基因在各个时间点的表达值汇总为单个值,从而将纵向特征选择过程简化为经典的特征选择过程。然后通过坐标下降法或阈值梯度下降正则化方法优化带有假基因的正则化逻辑回归模型(即基因跨时间的符号平均值作为预测变量)。通过将所提出的方法应用于模拟数据和创伤性损伤数据集,我们证明了所提出的方法,特别是符号平均和阈值梯度下降正则化的组合,优于其他竞争算法。总之,对于旨在对纵向基因表达数据进行特征选择的研究,强烈推荐所提出的方法。