Axelrod David E, Miller Naomi, Chapman Judith-Anne W
Department of Genetics and Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854-8082 U.S.A.
Biomed Inform Insights. 2009 Jan 1;2:11-18. doi: 10.4137/bii.s2222.
Information about tumors is usually obtained from a single assessment of a tumor sample, performed at some point in the course of the development and progression of the tumor, with patient characteristics being surrogates for natural history context. Differences between cells within individual tumors (intratumor heterogeneity) and between tumors of different patients (intertumor heterogeneity) may mean that a small sample is not representative of the tumor as a whole, particularly for solid tumors which are the focus of this paper. This issue is of increasing importance as high-throughput technologies generate large multi-feature data sets in the areas of genomics, proteomics, and image analysis. Three potential pitfalls in statistical analysis are discussed (sampling, cut-points, and validation) and suggestions are made about how to avoid these pitfalls.
关于肿瘤的信息通常来自对肿瘤样本的单次评估,该评估在肿瘤发展和进展过程中的某个时间点进行,患者特征作为自然史背景的替代指标。个体肿瘤内的细胞之间(肿瘤内异质性)以及不同患者的肿瘤之间(肿瘤间异质性)的差异可能意味着小样本不能代表整个肿瘤,尤其是对于本文所关注的实体瘤。随着高通量技术在基因组学、蛋白质组学和图像分析领域生成大量多特征数据集,这个问题变得越来越重要。本文讨论了统计分析中的三个潜在陷阱(抽样、切点和验证),并就如何避免这些陷阱提出了建议。