Silesian University of Technology, Institute of Automatic Control, Akademicka 16, 44-100 Gliwice, Poland.
Math Biosci Eng. 2013 Jun;10(3):667-690. doi: 10.3934/mbe.2013.10.667.
The problem of feature selection for large-scale genomic data, for example from DNA microarray experiments, is one of the fundamental and well-investigated problems in modern computational biology. From the computational point of view, a selected gene list should be characterized by good predictive power and should be understood and well explained from the biological point of view. Recently, another feature of selected gene lists is increasingly investigated, namely their stability which measures how the content and/or the gene order change when the data are perturbed. In this paper we propose a new approach to analysis of gene list stability, termed the sensitivity index, that does not require any data perturbation and allows the gene list that is most reliable in a biological sense to be chosen.
例如从 DNA 微阵列实验中获取的大规模基因组数据的特征选择问题是现代计算生物学中基本且研究充分的问题之一。从计算的角度来看,选择的基因列表应具有良好的预测能力,并且从生物学的角度来看应该是可以理解和解释的。最近,选择的基因列表的另一个特征,即它们的稳定性,也越来越受到关注,稳定性衡量的是当数据受到干扰时内容和/或基因顺序的变化。在本文中,我们提出了一种新的基因列表稳定性分析方法,称为敏感性指数,它不需要任何数据干扰,并允许选择在生物学意义上最可靠的基因列表。