Cheng Jie, Greshock Joel, Painter Jeffery, Lin Xiwu, Lee Kwan, Zheng Shu, Menius Alan
Quantitative Sciences, GlaxoSmithKline, Collegeville, PA 19426, USA.
J Integr Bioinform. 2012 Aug 2;9(2):209. doi: 10.2390/biecoll-jib-2012-209.
We developed a novel tool for microarray data analysis that can parsimoniously discover highly predictive genes by finding the optimal trade off between fold change and t-test p value through rigorous cross validation. In addition to find a small set of highly predictive genes, the tool also has a procedure that recursively discovers and removes predictive genes from the dataset until no such genes can be found. We applied our tool to a public breast cancer dataset with the goal to discover genes that can predict patient’s response to a preoperative chemotherapy. The results show that estrogen receptor (ER) gene is the most important gene to predict chemotherapeutic response and no gene signatures can add much clinical benefit for the whole patient population. We further identified a clinically homogenous subgroup of patients (ER-negative, PR-negative and HER2-negative) whose response to the chemotherapy can be reasonably predicted. Many of the discovered predictive markers for this subgroup of patients were successfully validated using a blinded validation set.
我们开发了一种用于微阵列数据分析的新型工具,该工具可以通过严格的交叉验证,在倍数变化和t检验p值之间找到最佳平衡,从而简洁地发现具有高度预测性的基因。除了找到一小部分具有高度预测性的基因外,该工具还有一个程序,可递归地从数据集中发现并去除预测性基因,直到找不到此类基因。我们将我们的工具应用于一个公开的乳腺癌数据集,目的是发现能够预测患者对术前化疗反应的基因。结果表明,雌激素受体(ER)基因是预测化疗反应最重要的基因,并且没有基因特征能够为整个患者群体带来更多的临床益处。我们进一步确定了一个临床特征均一的患者亚组(ER阴性、PR阴性和HER2阴性),其对化疗的反应可以得到合理预测。使用一个盲法验证集成功验证了许多为该患者亚组发现的预测标志物。