基于微阵列数据的肿瘤分类排名

Tumor classification ranking from microarray data.

作者信息

Hewett Rattikorn, Kijsanayothin Phongphun

机构信息

Department of Computer Science, Texas Tech University, Abilene, TX 79601, USA.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

DOI:10.1186/1471-2164-9-S2-S21

PMID:18831787

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2559886/

Abstract

BACKGROUND

Gene expression profiles based on microarray data are recognized as potential diagnostic indices of cancer. Molecular tumor classifications resulted from these data and learning algorithms have advanced our understanding of genetic changes associated with cancer etiology and development. However, classifications are not always perfect and in such cases the classification rankings (likelihoods of correct class predictions) can be useful for directing further research (e.g., by deriving inferences about predictive indicators or prioritizing future experiments). Classification ranking is a challenging problem, particularly for microarray data, where there is a huge number of possible regulated genes with no known rating function. This study investigates the possibility of making tumor classification more informative by using a method for classification ranking that requires no additional ranking analysis and maintains relatively good classification accuracy.

RESULTS

Microarray data of 11 different types and subtypes of cancer were analyzed using MDR (Multi-Dimensional Ranker), a recently developed boosting-based ranking algorithm. The number of predictor genes in all of the resulting classification models was at most nine, a huge reduction from the more than 12 thousands genes in the majority of the expression samples. Compared to several other learning algorithms, MDR gives the greatest AUC (area under the ROC curve) for the classifications of prostate cancer, acute lymphoblastic leukemia (ALL) and four ALL subtypes: BCR-ABL, E2A-PBX1, MALL and TALL. SVM (Support Vector Machine) gives the highest AUC for the classifications of lung, lymphoma, and breast cancers, and two ALL subtypes: Hyperdiploid > 50 and TEL-AML1. MDR gives highly competitive results, producing the highest average AUC, 91.01%, and an average overall accuracy of 90.01% for cancer expression analysis.

CONCLUSION

Using the classification rankings from MDR is a simple technique for obtaining effective and informative tumor classifications from cancer gene expression data. Further interpretation of the results obtained from MDR is required. MDR can also be used directly as a simple feature selection mechanism to identify genes relevant to tumor classification. MDR may be applicable to many other classification problems for microarray data.

摘要

背景

基于微阵列数据的基因表达谱被认为是癌症潜在的诊断指标。由这些数据和学习算法得出的分子肿瘤分类，推进了我们对与癌症病因及发展相关的基因变化的理解。然而，分类并不总是完美的，在这种情况下，分类排名（正确分类预测的可能性）对于指导进一步研究可能是有用的（例如，通过推导关于预测指标的推论或为未来实验确定优先级）。分类排名是一个具有挑战性的问题，特别是对于微阵列数据而言，其中存在大量可能的调控基因且没有已知的评分函数。本研究调查了使用一种无需额外排名分析且保持相对良好分类准确性的分类排名方法，使肿瘤分类更具信息性的可能性。

结果

使用最近开发的基于提升的排名算法MDR（多维排名器）分析了11种不同类型和亚型癌症的微阵列数据。所有所得分类模型中的预测基因数量最多为9个，与大多数表达样本中超过12000个基因相比大幅减少。与其他几种学习算法相比，对于前列腺癌、急性淋巴细胞白血病（ALL）以及四种ALL亚型：BCR-ABL、E2A-PBX1、MALL和TALL的分类，MDR给出了最大的曲线下面积（AUC）。对于肺癌、淋巴瘤和乳腺癌以及两种ALL亚型：超二倍体>50和TEL-AML1的分类，支持向量机（SVM）给出了最高的AUC。MDR给出了极具竞争力的结果，在癌症表达分析中产生了最高的平均AUC，即91.01%，以及平均总体准确率90.01%。

结论

使用MDR的分类排名是一种从癌症基因表达数据中获得有效且信息丰富的肿瘤分类的简单技术。需要对从MDR获得的结果进行进一步解释。MDR还可以直接用作一种简单的特征选择机制来识别与肿瘤分类相关的基因。MDR可能适用于微阵列数据的许多其他分类问题。

相似文献

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates.基于类别优化基因和概率估计的支持向量机进行多类别癌症分类

J Theor Biol. 2009 Aug 7;259(3):533-40. doi: 10.1016/j.jtbi.2009.04.013. Epub 2009 May 3.

Gene selection from microarray data for cancer classification--a machine learning approach.基于机器学习方法从微阵列数据中进行癌症分类的基因选择

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类

Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

Cancer classification and prediction using logistic regression with Bayesian gene selection.使用贝叶斯基因选择的逻辑回归进行癌症分类和预测。

J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009.

An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer.一种应用于卵巢癌微阵列数据的基因选择与分类综合算法。

Artif Intell Med. 2008 Jan;42(1):81-93. doi: 10.1016/j.artmed.2007.09.004. Epub 2007 Nov 19.

Ensemble machine learning on gene expression data for cancer classification.基于基因表达数据的集成机器学习用于癌症分类

Appl Bioinformatics. 2003;2(3 Suppl):S75-83.

Ensemble gene selection by grouping for microarray data classification.基于分组的微阵列数据分类的集成基因选择。

J Biomed Inform. 2010 Feb;43(1):81-7. doi: 10.1016/j.jbi.2009.08.010. Epub 2009 Aug 20.

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.使用非负主成分分析改进基因表达癌症分子模式发现

Genome Inform. 2008;21:200-11.

引用本文的文献

On the Bias of Precision Estimation Under Separate Sampling.关于单独抽样下精度估计的偏差

Cancer Inform. 2019 Jul 15;18:1176935119860822. doi: 10.1177/1176935119860822. eCollection 2019.

Combination of Genetic Variation and G72 Protein Level to Detect Schizophrenia: Machine Learning Approaches.基因变异与G72蛋白水平相结合用于检测精神分裂症：机器学习方法

Front Psychiatry. 2018 Nov 6;9:566. doi: 10.3389/fpsyt.2018.00566. eCollection 2018.

A roadmap towards personalized immunology.个性化免疫学之路。

NPJ Syst Biol Appl. 2018 Feb 6;4:9. doi: 10.1038/s41540-017-0045-9. eCollection 2018.

Pharmacogenomics of chronic hepatitis C therapy with genome-wide association studies.丙型肝炎慢性治疗的药物基因组学与全基因组关联研究

J Exp Pharmacol. 2010 Jun 23;2:73-82. doi: 10.2147/jep.s8655. eCollection 2010.

Establishment of a prediction model of changing trends in cardiac hypertrophy disease based on microarray data screening.基于微阵列数据筛选建立心脏肥大疾病变化趋势预测模型。

Exp Ther Med. 2016 May;11(5):1734-1740. doi: 10.3892/etm.2016.3105. Epub 2016 Feb 24.

Comparison of classification algorithms with wrapper-based feature selection for predicting osteoporosis outcome based on genetic factors in a taiwanese women population.基于遗传因素的台湾女性骨质疏松症预后分类算法与基于包装的特征选择比较。

Int J Endocrinol. 2013;2013:850735. doi: 10.1155/2013/850735. Epub 2013 Jan 14.

Gene expression profiles for predicting metastasis in breast cancer: a cross-study comparison of classification methods.用于预测乳腺癌转移的基因表达谱：分类方法的跨研究比较

ScientificWorldJournal. 2012;2012:380495. doi: 10.1100/2012/380495. Epub 2012 Nov 28.

Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data.将多重假设检验和亲和传播聚类相结合，可以实现基因表达数据的准确、稳健和样本量独立分类。

BMC Bioinformatics. 2012 Oct 17;13:270. doi: 10.1186/1471-2105-13-270.

Pharmacogenomics of drug efficacy in the interferon treatment of chronic hepatitis C using classification algorithms.使用分类算法对慢性丙型肝炎干扰素治疗中药物疗效的药物基因组学研究

Adv Appl Bioinform Chem. 2010;3:39-44. doi: 10.2147/aabc.s8656. Epub 2010 Jun 15.

Optimization based tumor classification from microarray gene expression data.基于优化的微阵列基因表达数据肿瘤分类。

PLoS One. 2011 Feb 4;6(2):e14579. doi: 10.1371/journal.pone.0014579.

本文引用的文献

Discovery of significant rules for classifying cancer diagnosis data.发现癌症诊断数据分类的重要规则。

Bioinformatics. 2003 Oct;19 Suppl 2:ii93-102. doi: 10.1093/bioinformatics/btg1066.

Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma.利用肺癌和间皮瘤中的基因表达比率将微阵列数据转化为具有临床相关性的癌症诊断测试。

Cancer Res. 2002 Sep 1;62(17):4963-7.

Gene expression correlates of clinical prostate cancer behavior.临床前列腺癌行为的基因表达相关性

Cancer Cell. 2002 Mar;1(2):203-9. doi: 10.1016/s1535-6108(02)00030-2.

Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.通过基因表达谱分析对儿童急性淋巴细胞白血病进行分类、亚型发现及预后预测。

Cancer Cell. 2002 Mar;1(2):133-43. doi: 10.1016/s1535-6108(02)00032-6.

Gene expression profiling predicts clinical outcome of breast cancer.基因表达谱分析可预测乳腺癌的临床预后。

Nature. 2002 Jan 31;415(6871):530-6. doi: 10.1038/415530a.

Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.通过基因表达谱鉴定出的不同类型弥漫性大B细胞淋巴瘤。

Nature. 2000 Feb 3;403(6769):503-11. doi: 10.1038/35000501.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类：通过基因表达监测进行类别发现和类别预测。

Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.

Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis.真核生物中翻译起始位点的神经网络预测：EST和基因组分析的前景

Proc Int Conf Intell Syst Mol Biol. 1997;5:226-33.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于微阵列数据的肿瘤分类排名

Tumor classification ranking from microarray data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献