通过使用符号平均值来总结基因表达值随时间的变化进行纵向数据的特征选择。

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time.

作者信息

Tian Suyan, Wang Chi

机构信息

Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China.

Center for Applied Statistical Research, School of Mathematics, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China.

出版信息

Biomed Res Int. 2019 Mar 19;2019:1724898. doi: 10.1155/2019/1724898. eCollection 2019.

DOI:10.1155/2019/1724898

PMID:31016185

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6444255/

Abstract

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene's expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data.

摘要

随着高通量技术的迅速发展，时间序列/纵向高通量实验已变得可行且经济实惠。然而，处理跨时间点基因表达谱的统计方法的发展未能跟上此类数据的爆炸式增长。特征选择过程对于纵向微阵列数据至关重要。在本研究中，我们提出使用符号平均法将基因在各个时间点的表达值汇总为单个值，从而将纵向特征选择过程简化为经典的特征选择过程。然后通过坐标下降法或阈值梯度下降正则化方法优化带有假基因的正则化逻辑回归模型（即基因跨时间的符号平均值作为预测变量）。通过将所提出的方法应用于模拟数据和创伤性损伤数据集，我们证明了所提出的方法，特别是符号平均和阈值梯度下降正则化的组合，优于其他竞争算法。总之，对于旨在对纵向基因表达数据进行特征选择的研究，强烈推荐所提出的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/329e/6444255/c53be2ba448e/BMRI2019-1724898.001.jpg

相似文献

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time.

Biomed Res Int. 2019 Mar 19;2019:1724898. doi: 10.1155/2019/1724898. eCollection 2019.

A longitudinal feature selection method identifies relevant genes to distinguish complicated injury and uncomplicated injury over time.

BMC Med Inform Decis Mak. 2018 Dec 7;18(Suppl 5):115. doi: 10.1186/s12911-018-0685-8.

GEE-TGDR: A Longitudinal Feature Selection Algorithm and Its Application to lncRNA Expression Profiles for Psoriasis Patients Treated with Immune Therapies.

Biomed Res Int. 2021 Apr 9;2021:8862895. doi: 10.1155/2021/8862895. eCollection 2021.

Structured feature selection using coordinate descent optimization.

BMC Bioinformatics. 2016 Apr 8;17:158. doi: 10.1186/s12859-016-0954-4.

Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes.

BMC Bioinformatics. 2015 Jun 26;16:200. doi: 10.1186/s12859-015-0635-8.

To select relevant features for longitudinal gene expression data by extending a pathway analysis method.

F1000Res. 2018 Jul 31;7:1166. doi: 10.12688/f1000research.15357.1. eCollection 2018.

Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):312-321. doi: 10.1109/TCBB.2017.2767589. Epub 2017 Oct 30.

Clustering threshold gradient descent regularization: with applications to microarray studies.

Bioinformatics. 2007 Feb 15;23(4):466-72. doi: 10.1093/bioinformatics/btl632. Epub 2006 Dec 20.

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.

Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.

Robust gene selection methods using weighting schemes for microarray data analysis.

BMC Bioinformatics. 2017 Sep 2;18(1):389. doi: 10.1186/s12859-017-1810-x.

引用本文的文献

GEE-TGDR: A Longitudinal Feature Selection Algorithm and Its Application to lncRNA Expression Profiles for Psoriasis Patients Treated with Immune Therapies.

Biomed Res Int. 2021 Apr 9;2021:8862895. doi: 10.1155/2021/8862895. eCollection 2021.

An ensemble of the iCluster method to analyze longitudinal lncRNA expression data for psoriasis patients.

Hum Genomics. 2021 Apr 20;15(1):23. doi: 10.1186/s40246-021-00323-6.

本文引用的文献

Molecular Classification of Lobular Carcinoma of the Breast.

Sci Rep. 2017 Mar 17;7:43265. doi: 10.1038/srep43265.

Stable feature selection based on the ensemble L -norm support vector machine for biomarker discovery.

BMC Genomics. 2016 Dec 22;17(Suppl 13):1026. doi: 10.1186/s12864-016-3320-z.

TTCA: an R package for the identification of differentially expressed genes in time course microarray data.

BMC Bioinformatics. 2017 Jan 14;18(1):33. doi: 10.1186/s12859-016-1440-8.

Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method.

PLoS One. 2016 Nov 15;11(11):e0165543. doi: 10.1371/journal.pone.0165543. eCollection 2016.

Weighted-SAMGSR: combining significance analysis of microarray-gene set reduction algorithm with pathway topology-based weights to select relevant genes.

Biol Direct. 2016 Sep 29;11(1):50. doi: 10.1186/s13062-016-0152-3.

Pathway-Based Genomics Prediction using Generalized Elastic Net.

PLoS Comput Biol. 2016 Mar 9;12(3):e1004790. doi: 10.1371/journal.pcbi.1004790. eCollection 2016 Mar.

FERAL: network-based classifier with application to breast cancer outcome prediction.

Bioinformatics. 2015 Jun 15;31(12):i311-9. doi: 10.1093/bioinformatics/btv255.

Más-o-menos: a simple sign averaging method for discrimination in genomic data analysis.

Bioinformatics. 2014 Nov 1;30(21):3062-9. doi: 10.1093/bioinformatics/btu488. Epub 2014 Jul 23.

PGS: a tool for association study of high-dimensional microRNA expression data with repeated measures.

Bioinformatics. 2014 Oct;30(19):2802-7. doi: 10.1093/bioinformatics/btu396. Epub 2014 Jun 19.

Microbial profiling of combat wound infection through detection microarray and next-generation sequencing.

J Clin Microbiol. 2014 Jul;52(7):2583-94. doi: 10.1128/JCM.00556-14. Epub 2014 May 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过使用符号平均值来总结基因表达值随时间的变化进行纵向数据的特征选择。

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献