Somorjai R L
Biomedical Informatics Group, Institute for Biodiagnostics, National Research Council Canada, 435 Ellice Ave., Winnipeg, MB, R3B 1Y6, Canada.
Biophys Rev. 2009 Dec;1(4):201-211. doi: 10.1007/s12551-009-0023-6. Epub 2009 Nov 25.
I describe in detail the intimately connected feature extraction and classifier development stages of the data-driven Statistical Classification Strategy (SCS) and compare them with current practice used in MR spectroscopy. We initially created the SCS for the analysis of MR and IR spectra of biofluids and tissues, and subsequently extended it to analyze biomedical data in general. I focus on explaining how to extract discriminatory spectral features and create robust classifiers that can reliably discriminate diseases and disease states. I discuss our approach to identifying features that retain spectral identity and provisionally relate these features, averaged subregions of the spectra, to specific chemical entities ("metabolites"). Particular emphasis is placed on describing the steps required to help create classifiers whose accuracy doesn't deteriorate significantly when presented with new, unknown samples. A simple but powerful extension of the discovered features to detect metabolite-metabolite (feature-feature) interactions is also sketched. I contrast the advantages and disadvantages of using either spectral signatures or explicit metabolite concentrations derived from the spectra as sets of discriminatory features. At present, no clear-cut preference is obvious and more objective comparisons will be needed. Finally, I argue that clinical requirements and exigencies strongly suggest adopting a two-phase approach to diagnosis/prognosis. In the first phase the emphasis ought to be on providing as accurate a diagnosis as possible, without any attempt to identify "biomarkers." That should be the goal of the second, research phase, with a view of providing prognosis on disease progression.
我详细描述了数据驱动的统计分类策略(SCS)中紧密相连的特征提取和分类器开发阶段,并将它们与磁共振波谱学中的当前实践进行比较。我们最初创建SCS是为了分析生物流体和组织的磁共振和红外光谱,随后将其扩展到一般生物医学数据的分析。我重点解释了如何提取具有区分性的光谱特征并创建能够可靠地区分疾病和疾病状态的稳健分类器。我讨论了我们识别保留光谱特征的特征的方法,并将这些特征(光谱的平均子区域)暂时与特定化学实体(“代谢物”)相关联。特别强调描述创建分类器所需的步骤,这些分类器在面对新的未知样本时其准确性不会显著下降。还概述了对发现的特征进行简单但强大的扩展以检测代谢物 - 代谢物(特征 - 特征)相互作用。我对比了使用光谱特征或从光谱中得出的明确代谢物浓度作为区分性特征集的优缺点。目前,没有明显的明确偏好,需要进行更客观的比较。最后,我认为临床需求和紧急情况强烈建议采用两阶段方法进行诊断/预后评估。在第一阶段,重点应该是尽可能提供准确的诊断,而不尝试识别“生物标志物”。这应该是第二阶段研究的目标,目的是提供疾病进展的预后评估。