Pittelkow Yvonne E, Wilson Susan R
Centre for Bioinformation Science, MSI, The Australian National University, Canberra, ACT 0200, Australia.
J Biomed Biotechnol. 2009;2009:587405. doi: 10.1155/2009/587405. Epub 2010 Jan 10.
Scientific advances are raising expectations that patient-tailored treatment will soon be available. The development of resulting clinical approaches needs to be based on well-designed experimental and observational procedures that provide data to which proper biostatistical analyses are applied. Gene expression microarray and related technology are rapidly evolving. It is providing extremely large gene expression profiles containing many thousands of measurements. Choosing a subset from these gene expression measurements to include in a gene expression signature is one of the many challenges needing to be met. Choice of this signature depends on many factors, including the selection of patients in the training set. So the reliability and reproducibility of the resultant prognostic gene signature needs to be evaluated, in such a way as to be relevant to the clinical setting. A relatively straightforward approach is based on cross validation, with separate selection of genes at each iteration to avoid selection bias. Within this approach we developed two different methods, one based on forward selection, the other on genes that were statistically significant in all training blocks of data. We demonstrate our approach to gene signature evaluation with a well-known breast cancer data set.
科学进步使得人们越来越期待个性化医疗能够很快实现。由此产生的临床方法的发展需要基于精心设计的实验和观察程序,这些程序要能提供可应用适当生物统计学分析的数据。基因表达微阵列及相关技术正在迅速发展。它能提供包含数千个测量值的极大规模的基因表达谱。从这些基因表达测量值中选择一个子集纳入基因表达特征是需要应对的众多挑战之一。这个特征的选择取决于许多因素,包括训练集中患者的选择。因此,需要以与临床环境相关的方式评估所得预后基因特征的可靠性和可重复性。一种相对直接的方法是基于交叉验证,在每次迭代时分别选择基因以避免选择偏差。在这种方法中,我们开发了两种不同的方法,一种基于向前选择,另一种基于在所有数据训练块中具有统计学显著性的基因。我们用一个著名的乳腺癌数据集展示了我们评估基因特征的方法。