Department of Electronic and Computer Engineering, University Campus, Technical University of Crete, Greece.
Artif Intell Med. 2011 Sep;53(1):57-71. doi: 10.1016/j.artmed.2011.06.003. Epub 2011 Jul 20.
Gene expression patterns that distinguish clinically significant disease subclasses may not only play a prominent role in diagnosis, but also lead to the therapeutic strategies tailoring the treatment to the particular biology of each disease. Nevertheless, gene expression signatures derived through statistical feature-extraction procedures on population datasets have received rightful criticism, since they share few genes in common, even when derived from the same dataset. We focus on knowledge complementarities conveyed by two or more gene-expression signatures by means of embedded biological processes and pathways, which alternatively form a meta-knowledge platform of analysis towards a more global, robust and powerful solution.
The main contribution of this work is the introduction and study of an approach for integrating different gene signatures based on the underlying biological knowledge, in an attempt to derive a unified global solution. It is further recognized that one group's signature does not perform well on another group's data, due to incompatibilities of microarray technologies and the experimental design. We assess this cross-platform aspect, showing that a unified solution derived on the basis of both statistical and biological validation may also help in overcoming such inconsistencies.
Based on the proposed approach we derived a unified 69-gene signature, which outperforms significantly the performance of the initial signatures succeeding a 0.73 accuracy metric on 234 new patients with 81% sensitivity and 64% specificity. The same signature manages to reveal the two prognostic groups on an additional dataset of 286 new patients obtained through a different experimental protocol and microarray platform. Furthermore, it manages to derive two clusters in a dataset from a different platform, showing remarkable difference on both gene-expression and survival-prediction levels.
能够区分临床显著疾病亚类的基因表达模式不仅可能在诊断中发挥重要作用,而且还可能导致针对每种疾病的特定生物学的治疗策略。然而,通过对人群数据集进行统计特征提取程序得出的基因表达特征受到了应有的批评,因为即使是从相同的数据集中得出的,它们也很少共享基因。我们专注于通过两个或更多基因表达特征所传达的知识互补性,通过嵌入式生物过程和途径,形成一个元知识分析平台,以实现更全面、更强大的解决方案。
这项工作的主要贡献是引入并研究了一种基于潜在生物学知识整合不同基因特征的方法,试图得出一个统一的全局解决方案。进一步认识到,由于微阵列技术和实验设计的不兼容性,一组特征在另一组数据上的表现不佳。我们评估了这种跨平台的方面,表明基于统计和生物学验证的统一解决方案也有助于克服这种不一致性。
基于所提出的方法,我们推导出了一个统一的 69 个基因特征,该特征在 234 名新患者的 0.73 准确性指标上的性能显著优于初始特征,其敏感性为 81%,特异性为 64%。同一特征在另一个通过不同实验方案和微阵列平台获得的 286 名新患者的数据集上成功地揭示了两个预后组。此外,它还成功地从一个不同平台的数据集中推导出两个聚类,在基因表达和生存预测水平上均显示出显著差异。