Boutros Paul C, Lau Suzanne K, Pintilie Melania, Liu Ni, Shepherd Frances A, Der Sandy D, Tsao Ming-Sound, Penn Linda Z, Jurisica Igor
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2824-8. doi: 10.1073/pnas.0809444106. Epub 2009 Feb 5.
Resectable non-small-cell lung cancer (NSCLC) patients have poor prognosis, with 30-50% relapsing within 5 years. Current staging criteria do not fully capture the complexity of this disease. Survival could be improved by identification of those early-stage patients who are most likely to benefit from adjuvant therapy. Molecular classification by using mRNA expression profiles has led to multiple, poorly overlapping signatures. We hypothesized that differing statistical methodologies contribute to this lack of overlap. To test this hypothesis, we analyzed our previously published quantitative RT-PCR dataset with a semisupervised method. A 6-gene signature was identified and validated in 4 independent public microarray datasets that represent a range of tumor histologies and stages. This result demonstrated that at least 2 prognostic signatures can be derived from this single dataset. We next estimated the total number of prognostic signatures in this dataset with a 10-million-signature permutation study. Our 6-gene signature was among the top 0.02% of signatures with maximum verifiability, reaffirming its efficacy. Importantly, this analysis identified 1,789 unique signatures, implying that our dataset contains >500,000 verifiable prognostic signatures for NSCLC. This result appears to rationalize the observed lack of overlap among reported NSCLC prognostic signatures.
可切除的非小细胞肺癌(NSCLC)患者预后较差,30%-50%的患者会在5年内复发。当前的分期标准并未完全体现出这种疾病的复杂性。通过识别那些最有可能从辅助治疗中获益的早期患者,生存率有望提高。利用mRNA表达谱进行分子分类已产生了多个重叠性较差的特征。我们推测不同的统计方法导致了这种缺乏重叠的情况。为了验证这一假设,我们用一种半监督方法分析了我们之前发表的定量逆转录聚合酶链反应(RT-PCR)数据集。在4个代表一系列肿瘤组织学和分期的独立公共微阵列数据集中识别并验证了一个6基因特征。这一结果表明,从这个单一数据集中至少可以得出2种预后特征。接下来,我们通过一项有1000万个特征的置换研究估计了该数据集中预后特征的总数。我们的6基因特征在具有最大可验证性的特征中位列前0.02%,再次证实了其有效性。重要的是,该分析识别出了1789个独特的特征,这意味着我们的数据集包含超过50万个可验证的NSCLC预后特征。这一结果似乎解释了所观察到的报道的NSCLC预后特征之间缺乏重叠的现象。