Statnikov Alexander, Li Chun, Aliferis Constantin F
Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.
PLoS One. 2007 Sep 26;2(9):e958. doi: 10.1371/journal.pone.0000958.
The development of new high-throughput genotyping technologies has allowed fast evaluation of single nucleotide polymorphisms (SNPs) on a genome-wide scale. Several recent genome-wide association studies employing these technologies suggest that panels of SNPs can be a useful tool for predicting cancer susceptibility and discovery of potentially important new disease loci.
METHODOLOGY/PRINCIPAL FINDINGS: In the present paper we undertake a careful examination of the relative significance of genetics, environmental factors, and biases of the data analysis protocol that was used in a previously published genome-wide association study. That prior study reported a nearly perfect discrimination of esophageal cancer patients and healthy controls on the basis of only genetic information. On the other hand, our results strongly suggest that SNPs in this dataset are not statistically linked to the phenotype, while several environmental factors and especially family history of esophageal cancer (a proxy to both environmental and genetic factors) have only a modest association with the disease.
CONCLUSIONS/SIGNIFICANCE: The main component of the previously claimed strong discriminatory signal is due to several data analysis pitfalls that in combination led to the strongly optimistic results. Such pitfalls are preventable and should be avoided in future studies since they create misleading conclusions and generate many false leads for subsequent research.
新型高通量基因分型技术的发展使得在全基因组范围内快速评估单核苷酸多态性(SNP)成为可能。最近几项采用这些技术的全基因组关联研究表明,SNP面板可作为预测癌症易感性和发现潜在重要新疾病位点的有用工具。
方法/主要发现:在本文中,我们仔细研究了遗传学、环境因素以及先前发表的全基因组关联研究中所使用的数据分析方案偏差的相对重要性。之前的那项研究报告称,仅基于遗传信息就能对食管癌患者和健康对照进行近乎完美的区分。另一方面,我们的结果强烈表明,该数据集中的SNP与表型在统计学上并无关联,而一些环境因素,尤其是食管癌家族史(环境和遗传因素的一个指标)与该疾病仅有适度关联。
结论/意义:先前声称的强鉴别信号的主要成分是由于几个数据分析陷阱,这些陷阱共同导致了过于乐观的结果。此类陷阱是可以预防的,在未来的研究中应予以避免,因为它们会产生误导性结论,并为后续研究带来许多错误线索。