Chang Yu-Tien, Huang Chi-Shuan, Yao Chung-Tay, Su Sui-Lung, Terng Harn-Jing, Chou Hsiu-Ling, Chou Yu-Ching, Chen Kang-Hua, Shih Yun-Wen, Lu Chian-Yu, Lai Ching-Huang, Jian Chen-En, Lin Chiao-Huang, Chen Chien-Ting, Wu Yi-Syuan, Lin Ke-Shin, Wetter Thomas, Chang Chi-Wen, Chu Chi-Ming
Yu-Tien Chang, Yun-Wen Shih, Chen-En Jian, Chiao-Huang Lin, Chien-Ting Chen, Yi-Syuan Wu, Ke-Shin Lin, Chi-Ming Chu, Division of Biomedical Statistics and Informatics, School of Public Health, National Defense Medical Center, Taipei 114, Taiwan.
World J Gastroenterol. 2014 Oct 21;20(39):14463-71. doi: 10.3748/wjg.v20.i39.14463.
Optimal molecular markers for detecting colorectal cancer (CRC) in a blood-based assay were evaluated.
A matched (by variables of age and sex) case-control design (111 CRC and 227 non-cancer samples) was applied. Total RNAs isolated from the 338 blood samples were reverse-transcribed, and the relative transcript levels of candidate genes were analyzed. The training set was made of 162 random samples of the total 338 samples. A logistic regression analysis was performed, and odds ratios for each gene were determined between CRC and non-cancer. The samples (n = 176) in the testing set were used to validate the logistic model, and an inferred performance (generality) was verified. By pooling 12 public microarray datasets(GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105), which included 519 cases of adenocarcinoma and 88 controls of normal mucosa, we were able to verify the selected genes from logistic models and estimate their external generality.
The logistic regression analysis resulted in the selection of five significant genes (P < 0.05; MDM2, DUSP6, CPEB4, MMD, and EIF2S3), with odds ratios of 2.978, 6.029, 3.776, 0.538 and 0.138, respectively. The five-gene model performed stably for the discrimination of CRC cases from controls in the training set, with accuracies ranging from 73.9% to 87.0%, a sensitivity of 95% and a specificity of 95%. In addition, a good performance in the test set was obtained using the discrimination model, providing 83.5% accuracy, 66.0% sensitivity, 92.0% specificity, a positive predictive value of 89.2% and a negative predictive value of 73.0%. Multivariate logistic regressions analyzed 12 pooled public microarray data sets as an external validation. Models that provided similar expected and observed event rates in subgroups were termed well calibrated. A model in which MDM2, DUSP6, CPEB4, MMD, and EIF2S3 were selected showed the result in logistic regression analysis (H-L P = 0.460, R2= 0.853, AUC = 0.978, accuracy = 0.949, specificity = 0.818 and sensitivity = 0.971).
A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.
评估用于基于血液检测的结直肠癌(CRC)检测的最佳分子标志物。
采用匹配(按年龄和性别变量)的病例对照设计(111例CRC和227例非癌样本)。从338份血液样本中分离出总RNA,进行逆转录,并分析候选基因的相对转录水平。训练集由338个样本中的162个随机样本组成。进行逻辑回归分析,确定CRC与非癌之间每个基因的优势比。测试集中的样本(n = 176)用于验证逻辑模型,并验证推断的性能(普遍性)。通过汇总12个公共微阵列数据集(GSE 4107、4183、8671、9348、10961、13067、13294、13471、14333、15960、17538和18105),其中包括519例腺癌病例和88例正常黏膜对照,我们能够验证逻辑模型中选择的基因并估计其外部普遍性。
逻辑回归分析筛选出5个显著基因(P < 0.05;MDM2、DUSP6、CPEB4、MMD和EIF2S3),其优势比分别为2.978、6.029、3.776、0.538和0.138。五基因模型在训练集中对CRC病例与对照的区分表现稳定,准确率在73.9%至87.0%之间,灵敏度为95%,特异性为95%。此外,使用判别模型在测试集中获得了良好的性能,准确率为83.5%,灵敏度为66.0%,特异性为92.0%,阳性预测值为89.2%,阴性预测值为73.0%。多变量逻辑回归分析12个汇总的公共微阵列数据集作为外部验证。在亚组中提供相似预期和观察事件率的模型称为校准良好。选择MDM2、DUSP6、CPEB4、MMD和EIF2S3的模型在逻辑回归分析中显示结果(H-L P = 0.460,R2 = 0.853,AUC = 0.978,准确率 = 0.949,特异性 = 0.818,灵敏度 = 0.971)。
一种新的基因表达谱与CRC相关,有可能应用于基于血液的检测方法。