Wang Lipo, Chu Feng
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798.
Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:807-10. doi: 10.1109/IEMBS.2010.5626565.
We present an approach to deriving very simple classification rules from microarray data by first selecting very small gene subsets that can ensure highly accurate classification of cancers. Finding such minimum gene subsets can greatly reduce the computational load and "noise" arising from irrelevant genes. The derived simple classification rules allow for accurate diagnosis without the need for any classifiers. This work can simplify gene expression tests by including only a very small number of genes rather than thousands or tens of thousands of genes, which can significantly bring down the cost for cancer testing. These studies also call for further investigations into possible biological relationship between these small number of genes and cancer development and treatment. For example, we report the following simple, and yet 100% accurate, diagnostic rules involving only 2 genes to separate the 3 types of lymphoma patients: the patient has diffuse large B-cell lymphoma (DLBCL), if and only if the expression level of gene GENE1622X is greater than -0.75; the patient has chronic lymphocytic leukaemia (CLL), if and only if the expression level of gene GENE540X is less than -1; and the patient has follicular lymphoma (FL) otherwise, i.e., if and only if the expression level of gene GENE1622X is less than -0.75 and the expression level of gene GENE540X is greater than -1.
我们提出了一种从微阵列数据中推导非常简单的分类规则的方法,首先选择非常小的基因子集,这些子集能够确保对癌症进行高度准确的分类。找到这样的最小基因子集可以大大减少计算量以及由不相关基因产生的“噪声”。推导得出的简单分类规则无需任何分类器就能实现准确诊断。这项工作可以通过仅纳入极少数基因而非成千上万的基因来简化基因表达测试,这能够显著降低癌症检测成本。这些研究还呼吁进一步探究这少数基因与癌症发展和治疗之间可能存在的生物学关系。例如,我们报告了以下仅涉及2个基因的简单且100%准确的诊断规则,用于区分3种类型的淋巴瘤患者:当且仅当基因GENE1622X的表达水平大于 -0.75时,患者患有弥漫性大B细胞淋巴瘤(DLBCL);当且仅当基因GENE540X的表达水平小于 -1时,患者患有慢性淋巴细胞白血病(CLL);否则,即当且仅当基因GENE1622X的表达水平小于 -0.75且基因GENE540X的表达水平大于 -1时,患者患有滤泡性淋巴瘤(FL)。