Läuter Jürgen, Horn Friedemann, Rosołowski Maciej, Glimm Ekkehard
Interdisciplinary Centre for Bioinformatics (IZBI), University of Leipzig, Härtelstr. 16-18, 04107 Leipzig, Germany.
Biom J. 2009 Apr;51(2):235-51. doi: 10.1002/bimj.200800207.
The paper presents effective and mathematically exact procedures for selection of variables which are applicable in cases with a very high dimension as, for example, in gene expression analysis. Choosing sets of variables is an important method to increase the power of the statistical conclusions and to facilitate the biological interpretation. For the construction of sets, each single variable is considered as the centre of potential sets of variables. Testing for significance is carried out by means of the Westfall-Young principle based on resampling or by the parametric method of spherical tests. The particular requirements for statistical stability are taken into account; each kind of overfitting is avoided. Thus, high power is attained and the familywise type I error can be kept in spite of the large dimension. To obtain graphical representations by heat maps and curves, a specific data compression technique is applied. Gene expression data from B-cell lymphoma patients serve for the demonstration of the procedures.
本文提出了有效且数学精确的变量选择程序,这些程序适用于维度非常高的情况,例如基因表达分析。选择变量集是提高统计结论效力和促进生物学解释的重要方法。对于集合的构建,每个单个变量都被视为潜在变量集的中心。通过基于重采样的韦斯特福尔 - 杨原理或球形检验的参数方法进行显著性检验。考虑到统计稳定性的特殊要求;避免了各种过拟合情况。因此,尽管维度很大,但仍能获得高效力并控制住家族性I型错误。为了通过热图和曲线获得图形表示,应用了一种特定的数据压缩技术。来自B细胞淋巴瘤患者的基因表达数据用于演示这些程序。