Berge Claude, Froloff Nicolas, Kalathur Ravi Kiran Reddy, Maumy Myriam, Poch Olivier, Raffelsberger Wolfgang, Wicker Nicolas
F. Hoffmann-La Roche Ltd., Basel, Switzerland.
J Comput Biol. 2010 May;17(5):723-32. doi: 10.1089/cmb.2009.0126.
Large multidimensional data matrices are frequent in biology. However, statistical methods often have difficulties dealing with such matrices because they contain very complex data sets. Consequently variable selection and dimensionality reduction methods are often used to reduce matrix complexity, although at the expense of information conservation. A new method derived from multidimensional scaling (MDS) is presented for the case where two matrices are available to describe the same population. The presented method transforms one of the matrices, called the target matrix, with some constraints to make it fit with the second matrix, referred to as the reference matrix. The fitting to the reference matrix is performed on the distances computed for the two matrices, and the transformation depends on the problem at hand. A special feature of this method is that a variable can be only partially modified. The method is applied on the exclusive-or (XOR) problem and then on a biological application with large-scale gene expression data.
大型多维数据矩阵在生物学中很常见。然而,统计方法在处理此类矩阵时常常遇到困难,因为它们包含非常复杂的数据集。因此,尽管会以信息保留为代价,但变量选择和降维方法经常被用于降低矩阵的复杂性。针对有两个矩阵可用于描述同一总体的情况,提出了一种源自多维缩放(MDS)的新方法。所提出的方法在一些约束条件下对其中一个矩阵(称为目标矩阵)进行变换,使其与第二个矩阵(称为参考矩阵)相匹配。对参考矩阵的拟合是基于为两个矩阵计算的距离进行的,并且变换取决于手头的问题。该方法的一个特殊之处在于一个变量只能被部分修改。该方法首先应用于异或(XOR)问题,然后应用于一个具有大规模基因表达数据的生物学应用中。