Ye Jieping, Chen Jianhui, Janardan Ravi, Kumar Sudhir
Center for Evolutionary Functional Genomics and Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, Email:
ACM Trans Knowl Discov Data. 2008 Mar;2(1). doi: 10.1145/1342320.1342324.
Gene expression in a developing embryo occurs in particular cells (spatial patterns) in a time-specific manner (temporal patterns), which leads to the differentiation of cell fates. Images of a Drosophila melanogaster embryo at a given developmental stage, showing a particular gene expression pattern revealed by a gene-specific probe, can be compared for spatial overlaps. The comparison is fundamentally important to formulating and testing gene interaction hypotheses. Expression pattern comparison is most biologically meaningful when images from a similar time point (developmental stage) are compared. In this paper, we present LdaPath, a novel formulation of Linear Discriminant Analysis (LDA) for automatic developmental stage range classification. It employs multivariate linear regression with the L(1)-norm penalty controlled by a regularization parameter for feature extraction and visualization. LdaPath computes an entire solution path for all values of regularization parameter with essentially the same computational cost as fitting one LDA model. Thus, it facilitates efficient model selection. It is based on the equivalence relationship between LDA and the least squares method for multi-class classifications. This equivalence relationship is established under a mild condition, which we show empirically to hold for many high-dimensional datasets, such as expression pattern images. Our experiments on a collection of 2705 expression pattern images show the effectiveness of the proposed algorithm. Results also show that the LDA model resulting from LdaPath is sparse, and irrelevant features may be removed. Thus, LdaPath provides a general framework for simultaneous feature selection and feature extraction.
发育胚胎中的基因表达以特定时间方式(时间模式)在特定细胞中发生(空间模式),这导致细胞命运的分化。可以比较给定发育阶段的黑腹果蝇胚胎的图像,这些图像显示了由基因特异性探针揭示的特定基因表达模式,以寻找空间重叠。这种比较对于制定和测试基因相互作用假说是至关重要的。当比较来自相似时间点(发育阶段)的图像时,表达模式比较在生物学上最有意义。在本文中,我们提出了LdaPath,这是一种用于自动发育阶段范围分类的线性判别分析(LDA)的新公式。它采用具有由正则化参数控制的L(1)范数惩罚的多元线性回归进行特征提取和可视化。LdaPath以与拟合一个LDA模型基本相同的计算成本为正则化参数的所有值计算整个解路径。因此,它有助于高效的模型选择。它基于LDA与多类分类的最小二乘法之间的等价关系。这种等价关系在一个温和的条件下建立,我们通过实验证明它对许多高维数据集(如表达模式图像)都成立。我们对2705张表达模式图像的实验表明了所提出算法的有效性。结果还表明,由LdaPath产生的LDA模型是稀疏的,并且可能会去除无关特征。因此,LdaPath提供了一个同时进行特征选择和特征提取的通用框架。