Fardin Paolo, Barla Annalisa, Mosci Sofia, Rosasco Lorenzo, Verri Alessandro, Varesio Luigi
Laboratorio di Biologia Molecolare, Giannina Gaslini Institute, Largo G Gaslini 5, I-16147 Genova, Italy.
BMC Genomics. 2009 Oct 15;10:474. doi: 10.1186/1471-2164-10-474.
Gene expression signatures are clusters of genes discriminating different statuses of the cells and their definition is critical for understanding the molecular bases of diseases. The identification of a gene signature is complicated by the high dimensional nature of the data and by the genetic heterogeneity of the responding cells. The l1-l2 regularization is an embedded feature selection technique that fulfills all the desirable properties of a variable selection algorithm and has the potential to generate a specific signature even in biologically complex settings. We studied the application of this algorithm to detect the signature characterizing the transcriptional response of neuroblastoma tumor cell lines to hypoxia, a condition of low oxygen tension that occurs in the tumor microenvironment.
We determined the gene expression profile of 9 neuroblastoma cell lines cultured under normoxic and hypoxic conditions. We studied a heterogeneous set of neuroblastoma cell lines to mimic the in vivo situation and to test the robustness and validity of the l1-l2 regularization with double optimization. Analysis by hierarchical, spectral, and k-means clustering or supervised approach based on t-test analysis divided the cell lines on the bases of genetic differences. However, the disturbance of this strong transcriptional response completely masked the detection of the more subtle response to hypoxia. Different results were obtained when we applied the l1-l2 regularization framework. The algorithm distinguished the normoxic and hypoxic statuses defining signatures comprising 3 to 38 probesets, with a leave-one-out error of 17%. A consensus hypoxia signature was established setting the frequency score at 50% and the correlation parameter epsilon equal to 100. This signature is composed by 11 probesets representing 8 well characterized genes known to be modulated by hypoxia.
We demonstrate that l1-l2 regularization outperforms more conventional approaches allowing the identification and definition of a gene expression signature under complex experimental conditions. The l1-l2 regularization and the cross validation generates an unbiased and objective output with a low classification error. We feel that the application of this algorithm to tumor biology will be instrumental to analyze gene expression signatures hidden in the transcriptome that, like hypoxia, may be major determinant of the course of the disease.
基因表达特征是区分细胞不同状态的基因簇,其定义对于理解疾病的分子基础至关重要。基因特征的识别因数据的高维性质和反应细胞的遗传异质性而变得复杂。L1-L2正则化是一种嵌入式特征选择技术,它满足变量选择算法的所有理想属性,甚至在生物学复杂的环境中也有可能生成特定的特征。我们研究了该算法在检测表征神经母细胞瘤肿瘤细胞系对缺氧(肿瘤微环境中发生的低氧张力状态)转录反应的特征方面的应用。
我们确定了在常氧和缺氧条件下培养的9种神经母细胞瘤细胞系的基因表达谱。我们研究了一组异质性的神经母细胞瘤细胞系,以模拟体内情况并测试具有双重优化的L1-L2正则化的稳健性和有效性。通过层次聚类、光谱聚类和k均值聚类或基于t检验分析的监督方法进行分析,根据遗传差异对细胞系进行了划分。然而,这种强烈转录反应的干扰完全掩盖了对缺氧更细微反应的检测。当我们应用L1-L2正则化框架时,获得了不同的结果。该算法区分了常氧和缺氧状态,定义了包含3至38个探针集的特征,留一法误差为17%。通过将频率分数设定为50%且相关参数ε等于100,建立了一个共识缺氧特征。该特征由11个探针集组成,代表8个已知受缺氧调节的特征明确的基因。
我们证明L1-L2正则化优于更传统的方法,能够在复杂的实验条件下识别和定义基因表达特征。L1-L2正则化和交叉验证产生了一个无偏且客观的输出,分类误差较低。我们认为该算法在肿瘤生物学中的应用将有助于分析隐藏在转录组中的基因表达特征,这些特征(如缺氧)可能是疾病进程的主要决定因素。