基于少量基因的微阵列数据样本的多类分类。

Multiclass classification of microarray data samples with a reduced number of genes.

机构信息

CIFASIS-Conicet Institute, Bv, 27 de Febrero 210 Bis, Rosario, Argentina.

出版信息

BMC Bioinformatics. 2011 Feb 22;12:59. doi: 10.1186/1471-2105-12-59.

DOI:10.1186/1471-2105-12-59

PMID:21342522

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3056725/

Abstract

BACKGROUND

Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained.

RESULTS

A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples.

CONCLUSIONS

A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.

摘要

背景

在生物信息学研究中，用较少的基因对微阵列数据样本进行多类分类是一个丰富而具有挑战性的问题。随着类别的数量增加，问题变得更加困难。此外，大多数分类器的性能与强制性基因选择方法的有效性密切相关。基因选择的关键是能否获得任何分类算法可以处理的最大基因数量的估计。缺乏这些估计可能会导致对具有数千个维度的搜索空间进行计算密集型探索，或者基于不受限制大小的基因集的分类模型。在前一种情况下，可能会出现无偏但可能过拟合的分类模型。在后一种情况下，可能会得到有偏差的分类模型，无法支持具有统计学意义的发现。

结果

提出了一种用于微阵列数据样本的二进制介导多类分类算法中二进制分类器可以处理的最大基因数量的新边界。该边界表明，高维二进制输出域可能有利于存在用于微阵列数据样本的准确和稀疏的二进制介导多类分类器。

结论

全面的实验工作表明，该边界确实可用于诱导用于微阵列数据样本的准确和稀疏的多类分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be7f/3056725/9df667afc8a9/1471-2105-12-59-1.jpg

相似文献

Multiclass classification of microarray data samples with a reduced number of genes.基于少量基因的微阵列数据样本的多类分类。

BMC Bioinformatics. 2011 Feb 22;12:59. doi: 10.1186/1471-2105-12-59.

Instance-based concept learning from multiclass DNA microarray data.基于实例的多类DNA微阵列数据概念学习

BMC Bioinformatics. 2006 Feb 16;7:73. doi: 10.1186/1471-2105-7-73.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.MSVM-RFE：用于DNA微阵列数据多类基因选择的SVM-RFE扩展方法

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression.基于基因表达的组织分类中特征选择与多类分类方法的比较研究

Bioinformatics. 2004 Oct 12;20(15):2429-37. doi: 10.1093/bioinformatics/bth267. Epub 2004 Apr 15.

Chaotic genetic algorithm for gene selection and classification problems.用于基因选择与分类问题的混沌遗传算法。

OMICS. 2009 Oct;13(5):407-20. doi: 10.1089/omi.2009.0007.

Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates.基于类别优化基因和概率估计的支持向量机进行多类别癌症分类

J Theor Biol. 2009 Aug 7;259(3):533-40. doi: 10.1016/j.jtbi.2009.04.013. Epub 2009 May 3.

Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.从微阵列数据中选择最少数量的相关基因以设计精确的组织分类器。

Biosystems. 2007 Jul-Aug;90(1):78-86. doi: 10.1016/j.biosystems.2006.07.002. Epub 2006 Jul 10.

A genetic programming-based approach to the classification of multiclass microarray datasets.一种基于遗传编程的多类微阵列数据集分类方法。

Bioinformatics. 2009 Feb 1;25(3):331-7. doi: 10.1093/bioinformatics/btn644. Epub 2008 Dec 16.

Multiclass classification of sarcomas using pathway based feature selection method.使用基于通路的特征选择方法对肉瘤进行多类别分类。

J Theor Biol. 2014 Dec 7;362:3-8. doi: 10.1016/j.jtbi.2014.06.038. Epub 2014 Jul 8.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

引用本文的文献

Sulfatase 2 Is Associated with Steroid Resistance in Childhood Nephrotic Syndrome.硫酸酯酶2与儿童肾病综合征的类固醇抵抗有关。

J Clin Med. 2021 Feb 2;10(3):523. doi: 10.3390/jcm10030523.

DNA Barcoding through Quaternary LDPC Codes.通过四元低密度奇偶校验码进行DNA条形码技术

PLoS One. 2015 Oct 22;10(10):e0140459. doi: 10.1371/journal.pone.0140459. eCollection 2015.

Diagnostic biases in translational bioinformatics.转化生物信息学中的诊断偏差。

BMC Med Genomics. 2015 Aug 1;8:46. doi: 10.1186/s12920-015-0116-y.

Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer.基于多类BCGA-ELM的分类器，可识别与癌症特征相关的生物标志物。

BMC Bioinformatics. 2015 May 20;16:166. doi: 10.1186/s12859-015-0565-5.

本文引用的文献

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

Non-genetic heterogeneity of cells in development: more than just noise.发育过程中细胞的非遗传异质性：不止是噪声。

Development. 2009 Dec;136(23):3853-62. doi: 10.1242/dev.035139.

Factors influencing the statistical power of complex data analysis protocols for molecular signature development from microarray data.影响从微阵列数据开发分子特征的复杂数据分析方案统计效能的因素。

PLoS One. 2009;4(3):e4922. doi: 10.1371/journal.pone.0004922. Epub 2009 Mar 17.

High Dimensional Classification Using Features Annealed Independence Rules.使用特征退火独立规则的高维分类

Ann Stat. 2008;36(6):2605-2637. doi: 10.1214/07-AOS504.

A genetic programming-based approach to the classification of multiclass microarray datasets.一种基于遗传编程的多类微阵列数据集分类方法。

Bioinformatics. 2009 Feb 1;25(3):331-7. doi: 10.1093/bioinformatics/btn644. Epub 2008 Dec 16.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.基于微阵列的癌症分类中随机森林与支持向量机的全面比较

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Mistakes in validating the accuracy of a prediction classifier in high-dimensional but small-sample microarray data.在高维但小样本的微阵列数据中验证预测分类器准确性时的错误。

Stat Methods Med Res. 2008 Dec;17(6):635-42. doi: 10.1177/0962280207084839. Epub 2008 Mar 28.

A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.已发表的癌症预后微阵列研究的批判性综述以及统计分析与报告指南。

J Natl Cancer Inst. 2007 Jan 17;99(2):147-57. doi: 10.1093/jnci/djk018.

Assessing stability of gene selection in microarray data analysis.评估基因芯片数据分析中基因选择的稳定性。

BMC Bioinformatics. 2006 Feb 1;7:50. doi: 10.1186/1471-2105-7-50.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于少量基因的微阵列数据样本的多类分类。

Multiclass classification of microarray data samples with a reduced number of genes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献