Qin Li-Xuan, Zhou Qin, Bogomolniy Faina, Villafania Liliana, Olvera Narciso, Cavatore Magali, Satagopan Jaya M, Begg Colin B, Levine Douglas A
Authors' Affiliations: Departments of Epidemiology and Biostatistics and
Authors' Affiliations: Departments of Epidemiology and Biostatistics and.
Clin Cancer Res. 2014 Jul 1;20(13):3371-8. doi: 10.1158/1078-0432.CCR-13-3155. Epub 2014 May 1.
Randomization and blocking have the potential to prevent the negative impacts of nonbiologic effects on molecular biomarker discovery. Their use in practice, however, has been scarce. To demonstrate the logistic feasibility and scientific benefits of randomization and blocking, we conducted a microRNA study of endometrial tumors (n = 96) and ovarian tumors (n = 96) using a blocked randomization design to control for nonbiologic effects; we profiled the same set of tumors for a second time using no blocking or randomization. We assessed empirical evidence of differential expression in the two studies. We performed simulations through virtual rehybridizations to further evaluate the effects of blocking and randomization. There was moderate and asymmetric differential expression (351/3,523, 10%) between endometrial and ovarian tumors in the randomized dataset. Nonbiologic effects were observed in the nonrandomized dataset, and 1,934 markers (55%) were called differentially expressed. Among them, 185 were deemed differentially expressed (185/351, 53%) and 1,749 not differentially expressed (1,749/3,172, 55%) in the randomized dataset. In simulations, when randomization was applied to all samples at once or within batches of samples balanced in tumor groups, blocking improved the true-positive rate from 0.95 to 0.97 and the false-positive rate from 0.02 to 0.002; when sample batches were unbalanced, randomization was associated with the true-positive rate (0.92) and the false-positive rate (0.10) regardless of blocking. Normalization improved the detection of true-positive markers but still retained sizeable false-positive markers. Randomization and blocking should be used in practice to more fully reap the benefits of genomics technologies.
随机化和区组设计有潜力防止非生物学效应对分子生物标志物发现产生负面影响。然而,它们在实际应用中却很少见。为了证明随机化和区组设计在逻辑上的可行性和科学益处,我们对96例子宫内膜肿瘤和96例卵巢肿瘤进行了一项微小RNA研究,采用区组随机化设计来控制非生物学效应;我们对同一组肿瘤再次进行分析,这次未采用区组设计或随机化。我们评估了两项研究中差异表达的实证证据。我们通过虚拟再杂交进行模拟,以进一步评估区组设计和随机化的效果。在随机化数据集中,子宫内膜肿瘤和卵巢肿瘤之间存在中度且不对称的差异表达(351/3523,10%)。在非随机化数据集中观察到了非生物学效应,有1934个标志物(55%)被判定为差异表达。其中,在随机化数据集中,有185个被认为是差异表达的(185/351,53%),1749个不是差异表达的(1749/3172,55%)。在模拟中,当一次性对所有样本或在肿瘤组平衡的样本批次内应用随机化时,区组设计将真阳性率从0.95提高到0.97,将假阳性率从0.02降低到0.002;当样本批次不平衡时,无论是否采用区组设计,随机化都与真阳性率(0.92)和假阳性率(0.10)相关。标准化提高了真阳性标志物的检测率,但仍保留了大量假阳性标志物。在实际应用中应使用随机化和区组设计,以便更充分地获得基因组技术的益处。