Yap YeeLeng, Zhang XueWu, Ling M T, Wang Xianghong, Wong Y C, Danchin Antoine
HKU-Pasteur Research Centre, Dexter H,C, Man Building, 8 Sassoon Road Pokfulam, HongKong, China.
BMC Cancer. 2004 Oct 7;4:72. doi: 10.1186/1471-2407-4-72.
Precise classification of cancer types is critically important for early cancer diagnosis and treatment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. However, reliable cancer-related signals are generally lacking.
Using recent datasets on colon and prostate cancer, a data transformation procedure from single gene expression to pair-wise gene expression ratio is proposed. Making use of the internal consistency of each expression profiling dataset this transformation improves the signal to noise ratio of the dataset and uncovers new relevant cancer-related signals (features). The efficiency in using the transformed dataset to perform normal/tumor classification was investigated using feature partitioning with informative features (gene annotation) as discriminating axes (single gene expression or pair-wise gene expression ratio). Classification results were compared to the original datasets for up to 10-feature model classifiers.
82 and 262 genes that have high correlation to tissue phenotype were selected from the colon and prostate datasets respectively. Remarkably, data transformation of the highly noisy expression data successfully led to lower the coefficient of variation (CV) for the within-class samples as well as improved the correlation with tissue phenotypes. The transformed dataset exhibited lower CV when compared to that of single gene expression. In the colon cancer set, the minimum CV decreased from 45.3% to 16.5%. In prostate cancer, comparable CV was achieved with and without transformation. This improvement in CV, coupled with the improved correlation between the pair-wise gene expression ratio and tissue phenotypes, yielded higher classification efficiency, especially with the colon dataset - from 87.1% to 93.5%. Over 90% of the top ten discriminating axes in both datasets showed significant improvement after data transformation. The high classification efficiency achieved suggested that there exist some cancer-related signals in the form of pair-wise gene expression ratio.
The results from this study indicated that: 1) in the case when the pair-wise expression ratio transformation achieves lower CV and higher correlation to tissue phenotypes, a better classification of tissue type will follow. 2) the comparable classification accuracy achieved after data transformation suggested that pair-wise gene expression ratio between some pairs of genes can identify reliable markers for cancer.
癌症类型的精确分类对于早期癌症诊断和治疗至关重要。人们已做出诸多努力,利用基因表达谱来提高肿瘤分类的准确性。然而,通常缺乏可靠的癌症相关信号。
利用近期的结肠癌和前列腺癌数据集,提出了一种从单基因表达到成对基因表达比率的数据转换程序。利用每个表达谱数据集的内部一致性,这种转换提高了数据集的信噪比,并揭示了新的相关癌症相关信号(特征)。使用以信息性特征(基因注释)作为判别轴(单基因表达或成对基因表达比率)的特征划分,研究了使用转换后的数据集进行正常/肿瘤分类的效率。将分类结果与原始数据集进行比较,用于多达10个特征的模型分类器。
分别从结肠癌和前列腺癌数据集中选择了82个和262个与组织表型高度相关的基因。值得注意的是,对高噪声表达数据进行数据转换成功降低了类内样本的变异系数(CV),并提高了与组织表型的相关性。与单基因表达相比,转换后的数据集显示出更低的CV。在结肠癌组中,最小CV从45.3%降至16.5%。在前列腺癌中,转换与否实现了相当的CV。CV的这种改善,再加上成对基因表达比率与组织表型之间改善的相关性,产生了更高的分类效率,尤其是对于结肠数据集——从87.1%提高到93.5%。两个数据集中超过90%的前十个判别轴在数据转换后显示出显著改善。所实现的高分类效率表明,存在以成对基因表达比率形式的一些癌症相关信号。
本研究结果表明:1)在成对表达比率转换实现更低CV且与组织表型具有更高相关性的情况下,将实现更好的组织类型分类。2)数据转换后实现的相当的分类准确性表明,某些基因对之间的成对基因表达比率可以识别癌症的可靠标志物。