Martin Francis L, German Matthew J, Wit Ernst, Fearn Thomas, Ragavan Narasimhan, Pollock Hubert M
Biomedical Sciences Unit, Lancaster University, Lancaster, United Kingdom.
J Comput Biol. 2007 Nov;14(9):1176-84. doi: 10.1089/cmb.2007.0057.
In the biomedical field, infrared (IR) spectroscopic studies can involve the processing of data derived from many samples, divided into classes such as category of tissue (e.g., normal or cancerous) or patient identity. We require reliable methods to identify the class-specific information on which of the wavenumbers, representing various molecular groups, are responsible for observed class groupings. Employing a prostate tissue sample divided into three regions (transition zone, peripheral zone, and adjacent adenocarcinoma), and interrogated using synchrotron Fourier-transform IR microspectroscopy, we compared two statistical methods: (a) a new "cluster vector" version of principal component analysis (PCA) in which the dimensions of the dataset are reduced, followed by linear discriminant analysis (LDA) to reveal clusters, through each of which a vector is constructed that identifies the contributory wavenumbers; and (b) stepwise LDA, which exploits the fact that spectral peaks which identify certain chemical bonds extend over several wavenumbers, and which following classification via either one or two wavenumbers, checks whether the resulting predictions are stable across a range of nearby wavenumbers. Stepwise LDA is the simpler of the two methods; the cluster vector approach can indicate which of the different classes of spectra exhibit the significant differences in signal seen at the "prominent" wavenumbers identified. In situations where IR spectra are found to separate into classes, the excellent agreement between the two quite different methods points to what will prove to be a new and reliable approach to establishing which molecular groups are responsible for such separation.
在生物医学领域,红外(IR)光谱研究可能涉及对来自许多样本的数据进行处理,这些样本分为不同类别,如组织类别(例如正常或癌组织)或患者身份。我们需要可靠的方法来识别特定类别的信息,即代表各种分子基团的哪些波数导致了观察到的类别分组。我们使用一个分为三个区域(移行带、外周带和相邻腺癌)的前列腺组织样本,并通过同步加速器傅里叶变换红外显微光谱进行检测,比较了两种统计方法:(a)一种新的主成分分析(PCA)“聚类向量”版本,先对数据集进行降维,然后进行线性判别分析(LDA)以揭示聚类,通过每个聚类构建一个向量来识别有贡献的波数;(b)逐步线性判别分析,该方法利用了识别某些化学键的光谱峰延伸到几个波数的这一事实,在通过一个或两个波数进行分类后,检查所得预测在一系列附近波数范围内是否稳定。逐步线性判别分析是两种方法中较简单的一种;聚类向量方法可以指出在已识别的“突出”波数处,不同类别的光谱在信号上表现出显著差异的是哪一类。在发现红外光谱可分为不同类别的情况下,这两种截然不同的方法之间的出色一致性表明,这将被证明是一种新的、可靠的方法,用于确定哪些分子基团导致了这种分类。