Xu Lei, Geman Donald, Winslow Raimond L
The Institute for Computational Medicine and Center for Cardiovascular Bioinformatics and Modeling, Johns Hopkins University, Baltimore, MD 21218, USA.
BMC Bioinformatics. 2007 Jul 30;8:275. doi: 10.1186/1471-2105-8-275.
There is a continuing need to develop molecular diagnostic tools which complement histopathologic examination to increase the accuracy of cancer diagnosis. DNA microarrays provide a means for measuring gene expression signatures which can then be used as components of genomic-based diagnostic tests to determine the presence of cancer.
In this study, we collect and integrate ~1500 microarray gene expression profiles from 26 published cancer data sets across 21 major human cancer types. We then apply a statistical method, referred to as the Top-Scoring Pair of Groups (TSPG) classifier, and a repeated random sampling strategy to the integrated training data sets and identify a common cancer signature consisting of 46 genes. These 46 genes are naturally divided into two distinct groups; those in one group are typically expressed less than those in the other group for cancer tissues. Given a new expression profile, the classifier discriminates cancer from normal tissues by ranking the expression values of the 46 genes in the cancer signature and comparing the average ranks of the two groups. This signature is then validated by applying this decision rule to independent test data.
By combining the TSPG method and repeated random sampling, a robust common cancer signature has been identified from large-scale microarray data integration. Upon further validation, this signature may be useful as a robust and objective diagnostic test for cancer.
持续需要开发能够补充组织病理学检查以提高癌症诊断准确性的分子诊断工具。DNA微阵列提供了一种测量基因表达特征的方法,这些特征随后可作为基于基因组的诊断测试的组成部分,用于确定癌症的存在。
在本研究中,我们收集并整合了来自21种主要人类癌症类型的26个已发表癌症数据集的约1500个微阵列基因表达谱。然后,我们将一种称为最高得分组对(TSPG)分类器的统计方法和重复随机抽样策略应用于整合后的训练数据集,并确定了一个由46个基因组成的常见癌症特征。这46个基因自然地分为两个不同的组;对于癌组织,其中一组中的基因通常比另一组中的基因表达量低。给定一个新的表达谱,分类器通过对癌症特征中46个基因的表达值进行排序并比较两组的平均排名来区分癌组织和正常组织。然后通过将此决策规则应用于独立测试数据来验证该特征。
通过结合TSPG方法和重复随机抽样,从大规模微阵列数据整合中识别出了一个稳健的常见癌症特征。经过进一步验证后,该特征可能作为一种稳健且客观的癌症诊断测试有用。