University Health Network, Ontario Cancer Institute/Princess Margaret Hospital, Ontario, Canada.
Clin Lung Cancer. 2009 Sep;10(5):331-40. doi: 10.3816/CLC.2009.n.045.
In non-small-cell lung cancer (NSCLC), molecular profiling of tumors has led to the identification of gene expression patterns that are associated with specific phenotypes and prognosis. Such correlations could identify early-stage patients who are at increased risk of disease recurrence and death after complete surgical resection and who might benefit from adjuvant therapy. Profiling may also identify aberrant molecular pathways that might lead to specific molecularly targeted therapies. The technology behind the capturing and correlating of molecular profiles with clinical and biologic endpoints have evolved rapidly since microarrays were first developed a decade ago. In this review, we discuss multiple methods that have been used to derive prognostic gene expression signatures in NSCLC. Despite the diversity in the approaches used, 3 main steps are followed. First, the expression levels of several hundred to tens of thousands of genes are quantified by microarray or quantitative polymerase chain reaction techniques; the data are then preprocessed, normalized, and possibly filtered. In the second step, expression data are combined and grouped by clustering, risk score generation, or other means, to generate a gene signature that correlates with a clinical outcome, usually survival. Finally, the signature is validated in datasets of independent cohorts. This review discusses the concepts and methodologies involved in these analytical steps, primarily to facilitate the understanding of reports on large dataset gene expression studies that focus on prognostic signatures in NSCLC.
在非小细胞肺癌 (NSCLC) 中,对肿瘤的分子谱分析导致了与特定表型和预后相关的基因表达模式的识别。这些相关性可以识别出在完全手术切除后疾病复发和死亡风险增加的早期患者,他们可能受益于辅助治疗。分析还可以识别可能导致特定分子靶向治疗的异常分子途径。自十年前微阵列首次开发以来,用于捕获和将分子谱与临床和生物学终点相关联的技术已经迅速发展。在这篇综述中,我们讨论了在 NSCLC 中衍生预后基因表达特征的多种方法。尽管所使用的方法存在多样性,但遵循 3 个主要步骤。首先,通过微阵列或定量聚合酶链反应技术定量几百到数万种基因的表达水平;然后对数据进行预处理、归一化,并可能进行过滤。在第二步中,通过聚类、风险评分生成或其他方法对表达数据进行组合和分组,以生成与临床结果(通常是生存)相关的基因特征。最后,在独立队列的数据集上验证该特征。本综述讨论了这些分析步骤中涉及的概念和方法,主要是为了促进对专注于 NSCLC 预后特征的大型数据集基因表达研究报告的理解。