Suppr超能文献

通过使用符号平均值来总结基因表达值随时间的变化进行纵向数据的特征选择。

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time.

作者信息

Tian Suyan, Wang Chi

机构信息

Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China.

Center for Applied Statistical Research, School of Mathematics, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China.

出版信息

Biomed Res Int. 2019 Mar 19;2019:1724898. doi: 10.1155/2019/1724898. eCollection 2019.

Abstract

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene's expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data.

摘要

随着高通量技术的迅速发展,时间序列/纵向高通量实验已变得可行且经济实惠。然而,处理跨时间点基因表达谱的统计方法的发展未能跟上此类数据的爆炸式增长。特征选择过程对于纵向微阵列数据至关重要。在本研究中,我们提出使用符号平均法将基因在各个时间点的表达值汇总为单个值,从而将纵向特征选择过程简化为经典的特征选择过程。然后通过坐标下降法或阈值梯度下降正则化方法优化带有假基因的正则化逻辑回归模型(即基因跨时间的符号平均值作为预测变量)。通过将所提出的方法应用于模拟数据和创伤性损伤数据集,我们证明了所提出的方法,特别是符号平均和阈值梯度下降正则化的组合,优于其他竞争算法。总之,对于旨在对纵向基因表达数据进行特征选择的研究,强烈推荐所提出的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/329e/6444255/c53be2ba448e/BMRI2019-1724898.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验