Suppr超能文献

基于微阵列数据的肿瘤分类排名

Tumor classification ranking from microarray data.

作者信息

Hewett Rattikorn, Kijsanayothin Phongphun

机构信息

Department of Computer Science, Texas Tech University, Abilene, TX 79601, USA.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Abstract

BACKGROUND

Gene expression profiles based on microarray data are recognized as potential diagnostic indices of cancer. Molecular tumor classifications resulted from these data and learning algorithms have advanced our understanding of genetic changes associated with cancer etiology and development. However, classifications are not always perfect and in such cases the classification rankings (likelihoods of correct class predictions) can be useful for directing further research (e.g., by deriving inferences about predictive indicators or prioritizing future experiments). Classification ranking is a challenging problem, particularly for microarray data, where there is a huge number of possible regulated genes with no known rating function. This study investigates the possibility of making tumor classification more informative by using a method for classification ranking that requires no additional ranking analysis and maintains relatively good classification accuracy.

RESULTS

Microarray data of 11 different types and subtypes of cancer were analyzed using MDR (Multi-Dimensional Ranker), a recently developed boosting-based ranking algorithm. The number of predictor genes in all of the resulting classification models was at most nine, a huge reduction from the more than 12 thousands genes in the majority of the expression samples. Compared to several other learning algorithms, MDR gives the greatest AUC (area under the ROC curve) for the classifications of prostate cancer, acute lymphoblastic leukemia (ALL) and four ALL subtypes: BCR-ABL, E2A-PBX1, MALL and TALL. SVM (Support Vector Machine) gives the highest AUC for the classifications of lung, lymphoma, and breast cancers, and two ALL subtypes: Hyperdiploid > 50 and TEL-AML1. MDR gives highly competitive results, producing the highest average AUC, 91.01%, and an average overall accuracy of 90.01% for cancer expression analysis.

CONCLUSION

Using the classification rankings from MDR is a simple technique for obtaining effective and informative tumor classifications from cancer gene expression data. Further interpretation of the results obtained from MDR is required. MDR can also be used directly as a simple feature selection mechanism to identify genes relevant to tumor classification. MDR may be applicable to many other classification problems for microarray data.

摘要

背景

基于微阵列数据的基因表达谱被认为是癌症潜在的诊断指标。由这些数据和学习算法得出的分子肿瘤分类,推进了我们对与癌症病因及发展相关的基因变化的理解。然而,分类并不总是完美的,在这种情况下,分类排名(正确分类预测的可能性)对于指导进一步研究可能是有用的(例如,通过推导关于预测指标的推论或为未来实验确定优先级)。分类排名是一个具有挑战性的问题,特别是对于微阵列数据而言,其中存在大量可能的调控基因且没有已知的评分函数。本研究调查了使用一种无需额外排名分析且保持相对良好分类准确性的分类排名方法,使肿瘤分类更具信息性的可能性。

结果

使用最近开发的基于提升的排名算法MDR(多维排名器)分析了11种不同类型和亚型癌症的微阵列数据。所有所得分类模型中的预测基因数量最多为9个,与大多数表达样本中超过12000个基因相比大幅减少。与其他几种学习算法相比,对于前列腺癌、急性淋巴细胞白血病(ALL)以及四种ALL亚型:BCR-ABL、E2A-PBX1、MALL和TALL的分类,MDR给出了最大的曲线下面积(AUC)。对于肺癌、淋巴瘤和乳腺癌以及两种ALL亚型:超二倍体>50和TEL-AML1的分类,支持向量机(SVM)给出了最高的AUC。MDR给出了极具竞争力的结果,在癌症表达分析中产生了最高的平均AUC,即91.01%,以及平均总体准确率90.01%。

结论

使用MDR的分类排名是一种从癌症基因表达数据中获得有效且信息丰富的肿瘤分类的简单技术。需要对从MDR获得的结果进行进一步解释。MDR还可以直接用作一种简单的特征选择机制来识别与肿瘤分类相关的基因。MDR可能适用于微阵列数据的许多其他分类问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验