Alexe Gabriela, Alexe Sorin, Axelrod David E, Bonates Tibérius O, Lozina Irina I, Reiss Michael, Hammer Peter L
RUTCOR (Rutgers University Center for Operations Research), Piscataway, New Jersey, USA.
Breast Cancer Res. 2006;8(4):R41. doi: 10.1186/bcr1512.
The potential of applying data analysis tools to microarray data for diagnosis and prognosis is illustrated on the recent breast cancer dataset of van 't Veer and coworkers. We re-examine that dataset using the novel technique of logical analysis of data (LAD), with the double objective of discovering patterns characteristic for cases with good or poor outcome, using them for accurate and justifiable predictions; and deriving novel information about the role of genes, the existence of special classes of cases, and other factors.
Data were analyzed using the combinatorics and optimization-based method of LAD, recently shown to provide highly accurate diagnostic and prognostic systems in cardiology, cancer proteomics, hematology, pulmonology, and other disciplines.
LAD identified a subset of 17 of the 25,000 genes, capable of fully distinguishing between patients with poor, respectively good prognoses. An extensive list of 'patterns' or 'combinatorial biomarkers' (that is, combinations of genes and limitations on their expression levels) was generated, and 40 patterns were used to create a prognostic system, shown to have 100% and 92.9% weighted accuracy on the training and test sets, respectively. The prognostic system uses fewer genes than other methods, and has similar or better accuracy than those reported in other studies. Out of the 17 genes identified by LAD, three (respectively, five) were shown to play a significant role in determining poor (respectively, good) prognosis. Two new classes of patients (described by similar sets of covering patterns, gene expression ranges, and clinical features) were discovered. As a by-product of the study, it is shown that the training and the test sets of van 't Veer have differing characteristics.
The study shows that LAD provides an accurate and fully explanatory prognostic system for breast cancer using genomic data (that is, a system that, in addition to predicting good or poor prognosis, provides an individualized explanation of the reasons for that prognosis for each patient). Moreover, the LAD model provides valuable insights into the roles of individual and combinatorial biomarkers, allows the discovery of new classes of patients, and generates a vast library of biomedical research hypotheses.
数据分析工具在微阵列数据诊断和预后评估中的应用潜力,在范特维尔及其同事近期的乳腺癌数据集中得到了体现。我们使用新颖的数据逻辑分析(LAD)技术重新审视该数据集,目的有二:一是发现预后良好或不良病例的特征模式,用于准确且合理的预测;二是获取有关基因作用、特殊病例类别存在情况及其他因素的新信息。
使用基于组合数学和优化的LAD方法对数据进行分析,该方法近期已证明在心脏病学、癌症蛋白质组学、血液学、肺病学及其他学科中能提供高度准确的诊断和预后评估系统。
LAD从25000个基因中识别出17个基因的子集,能够完全区分预后不良和预后良好的患者。生成了一份详尽的“模式”或“组合生物标志物”(即基因组合及其表达水平限制)列表,并用40种模式创建了一个预后评估系统,该系统在训练集和测试集上的加权准确率分别为100%和92.9%。该预后评估系统使用的基因比其他方法少,且准确率与其他研究报告的相近或更高。在LAD识别出的17个基因中,有3个(分别为5个)被证明在决定预后不良(分别为预后良好)方面起重要作用。发现了两类新的患者(由相似的覆盖模式集、基因表达范围和临床特征描述)。作为该研究的一个副产品,结果表明范特维尔的训练集和测试集具有不同特征。
该研究表明,LAD利用基因组数据为乳腺癌提供了一个准确且具有充分解释力的预后评估系统(即一个除了预测预后好坏外,还能为每个患者的预后原因提供个性化解释的系统)。此外,LAD模型为个体和组合生物标志物的作用提供了有价值的见解,有助于发现新的患者类别,并生成大量生物医学研究假设。