Ein-Dor Liat, Kela Itai, Getz Gad, Givol David, Domany Eytan
Department of Physics of Complex Systems, Weizmann Institute of Science Rehovot 76100, Israel.
Bioinformatics. 2005 Jan 15;21(2):171-8. doi: 10.1093/bioinformatics/bth469. Epub 2004 Aug 12.
Predicting the metastatic potential of primary malignant tissues has direct bearing on the choice of therapy. Several microarray studies yielded gene sets whose expression profiles successfully predicted survival. Nevertheless, the overlap between these gene sets is almost zero. Such small overlaps were observed also in other complex diseases, and the variables that could account for the differences had evoked a wide interest. One of the main open questions in this context is whether the disparity can be attributed only to trivial reasons such as different technologies, different patients and different types of analyses.
To answer this question, we concentrated on a single breast cancer dataset, and analyzed it by a single method, the one which was used by van't Veer et al. to produce a set of outcome-predictive genes. We showed that, in fact, the resulting set of genes is not unique; it is strongly influenced by the subset of patients used for gene selection. Many equally predictive lists could have been produced from the same analysis. Three main properties of the data explain this sensitivity: (1) many genes are correlated with survival; (2) the differences between these correlations are small; (3) the correlations fluctuate strongly when measured over different subsets of patients. A possible biological explanation for these properties is discussed.
http://www.weizmann.ac.il/physics/complex/compphys/downloads/liate/
预测原发性恶性组织的转移潜能对治疗方案的选择具有直接影响。多项微阵列研究产生了一些基因集,其表达谱成功预测了生存率。然而,这些基因集之间的重叠几乎为零。在其他复杂疾病中也观察到如此小的重叠,而能够解释这些差异的变量引起了广泛关注。在这种情况下,一个主要的开放性问题是,这种差异是否仅可归因于诸如不同技术、不同患者和不同分析类型等微不足道的原因。
为回答这个问题,我们专注于一个单一的乳腺癌数据集,并通过一种方法对其进行分析,即范特·维尔等人用于生成一组预后预测基因的方法。我们表明,实际上,最终得到的基因集并非唯一;它受到用于基因选择的患者子集的强烈影响。从相同分析中可以产生许多同样具有预测性的列表。数据的三个主要特性解释了这种敏感性:(1)许多基因与生存率相关;(2)这些相关性之间的差异很小;(3)当在不同患者子集上进行测量时,相关性波动很大。文中讨论了这些特性可能的生物学解释。
http://www.weizmann.ac.il/physics/complex/compphys/downloads/liate/