Deutsch J M
University of California, Santa Cruz, USA.
Bioinformatics. 2003 Jan;19(1):45-52. doi: 10.1093/bioinformatics/19.1.45.
Microarray data has been shown recently to be efficacious in distinguishing closely related cell types that often appear in different forms of cancer, but is not yet practical clinically. However, the data might be used to construct a minimal set of marker genes that could then be used clinically by making antibody assays to diagnose a specific type of cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors.
We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 to less than 15, while at the same time being able to classify all of their test data perfectly. We also apply this method to two other cases, Diffuse large B-cell lymphoma data (Shipp et al., 2002), and data of Ramaswamy et al. on multiclass diagnosis of 14 common tumor types.
最近的研究表明,微阵列数据在区分常出现在不同癌症形式中的密切相关细胞类型方面是有效的,但在临床应用中尚不实用。然而,这些数据可用于构建一组最小的标记基因,然后通过进行抗体检测在临床上用于诊断特定类型的癌症。为此,这里使用了一种复制算法。它演化出一组预测器,所有预测器都使用不同的基因组合来生成一组最优预测器。
我们将此方法应用于怀特黑德/麻省理工学院团队的白血病数据,该数据试图对两种白血病进行鉴别诊断,同时也应用于汗等人的数据以区分四种不同类型的儿童癌症。在后一种情况下,我们能够将所需基因数量从96个减少到不到15个,同时能够完美地对所有测试数据进行分类。我们还将此方法应用于另外两个案例,弥漫性大B细胞淋巴瘤数据(希普等人,2002年),以及拉马斯瓦米等人关于14种常见肿瘤类型多类诊断的数据。