Qin Li-Xuan, Levine Douglas A
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
BMC Med Genomics. 2016 Jun 10;9(1):27. doi: 10.1186/s12920-016-0187-4.
Accurate discovery of molecular biomarkers that are prognostic of a clinical outcome is an important yet challenging task, partly due to the combination of the typically weak genomic signal for a clinical outcome and the frequently strong noise due to microarray handling effects. Effective strategies to resolve this challenge are in dire need.
We set out to assess the use of careful study design and data normalization for the discovery of prognostic molecular biomarkers. Taking progression free survival in advanced serous ovarian cancer as an example, we conducted empirical analysis on two sets of microRNA arrays for the same set of tumor samples: arrays in one set were collected using careful study design (that is, uniform handling and randomized array-to-sample assignment) and arrays in the other set were not.
We found that (1) handling effects can confound the clinical outcome under study as a result of chance even with randomization, (2) the level of confounding handling effects can be reduced by data normalization, and (3) good study design cannot be replaced by post-hoc normalization. In addition, we provided a practical approach to define positive and negative control markers for detecting handling effects and assessing the performance of a normalization method.
Our work showcased the difficulty of finding prognostic biomarkers for a clinical outcome of weak genomic signals, illustrated the benefits of careful study design and data normalization, and provided a practical approach to identify handling effects and select a beneficial normalization method. Our work calls for careful study design and data analysis for the discovery of robust and translatable molecular biomarkers.
准确发现可预测临床结局的分子生物标志物是一项重要但具有挑战性的任务,部分原因在于临床结局的基因组信号通常较弱,且由于微阵列处理效应导致噪声频繁较强。迫切需要有效的策略来应对这一挑战。
我们着手评估通过精心设计研究和进行数据归一化来发现预后分子生物标志物的方法。以晚期浆液性卵巢癌的无进展生存期为例,我们对同一组肿瘤样本的两组 microRNA 阵列进行了实证分析:一组阵列是采用精心设计的研究方法收集的(即统一处理并随机进行阵列与样本分配),另一组则不是。
我们发现:(1)即使进行了随机化,处理效应仍可能因偶然因素混淆所研究的临床结局;(2)数据归一化可降低混杂处理效应的程度;(3)良好的研究设计不能被事后归一化所取代。此外,我们提供了一种实用方法来定义用于检测处理效应和评估归一化方法性能的阳性和阴性对照标志物。
我们的工作展示了为基因组信号较弱的临床结局寻找预后生物标志物的困难,阐明了精心设计研究和进行数据归一化的益处,并提供了一种实用方法来识别处理效应并选择有益的归一化方法。我们的工作呼吁在发现可靠且可转化的分子生物标志物时要精心设计研究和进行数据分析。