Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China.
Department of Biostatistics, Markey Cancer Center, The University of Kentucky, 800 Rose St., Lexington, KY 40536, USA.
Biomed Res Int. 2019 Apr 3;2019:2497509. doi: 10.1155/2019/2497509. eCollection 2019.
To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.
为了分析具有复杂分组结构的基因表达数据,并从这些数据中提取隐藏模式,特征选择至关重要。众所周知,基因不是孤立工作的,而是在各种代谢、调节和信号通路中共同发挥作用。如果考虑到这些通路中包含的生物学知识,那么所得到的方法就是基于通路的算法。研究表明,基于通路的方法通常比不考虑任何生物学知识的基于基因的方法表现更好。在本文中,首先将基于通路的特征选择分为三大类,即通路水平选择、双层选择和基于通路的基因选择。由于双层选择方法被视为基于通路的基因选择过程的一个特例,我们详细讨论了基于通路的基因选择方法以及在这些方法中惩罚的重要性。最后,我们指出了基于通路的基因选择在一个活跃的研究领域中的潜在应用,即分析纵向基因表达数据。我们相信本文为计算生物学家和生物统计学家提供了有价值的见解,使他们能够使生物学更具计算性。