Suppr超能文献

癌症研究中基于流式细胞术的分类:特征选择之见解

Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection.

作者信息

Hassan S Sakira, Ruusuvuori Pekka, Latonen Leena, Huttunen Heikki

机构信息

Department of Signal Processing, Tampere University of Technology, Tampere, Finland.

Pori Department, Tampere University of Technology, Pori, Finland.; BioMediTech, University of Tampere, Tampere, Finland.

出版信息

Cancer Inform. 2016 Apr 10;14(Suppl 5):75-85. doi: 10.4137/CIN.S30795. eCollection 2015.

Abstract

In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ 1 regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator.

摘要

在本文中,我们研究癌症相关机器学习任务中的特征选择问题。具体而言,我们研究了简单机器学习流程中不同特征选择方法的准确性和稳定性。早期研究表明,在某些情况下,给定足够的训练数据,检测准确率很容易达到100%。然而,在这里我们专注于简化分类模型,并寻找即使在样本量极小的情况下也可靠的特征选择方法。我们表明,在不影响预测准确性的情况下,可以舍弃多达50%的特征。此外,我们研究了逻辑回归分类器的ℓ1正则化路径中的模型选择问题。为此,我们将一种更传统的交叉验证方法与最近提出的贝叶斯误差估计器进行了比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/4827794/5250745276dc/cin-suppl.5-2015-075f1.jpg

相似文献

1
Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection.
Cancer Inform. 2016 Apr 10;14(Suppl 5):75-85. doi: 10.4137/CIN.S30795. eCollection 2015.
3
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.
Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.
4
Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection.
BMC Bioinformatics. 2019 Sep 18;20(1):480. doi: 10.1186/s12859-019-3050-8.
5
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.
Neuroinformatics. 2016 Jul;14(3):279-96. doi: 10.1007/s12021-015-9292-3.
6
Quantification of the impact of feature selection on the variance of cross-validation error estimation.
EURASIP J Bioinform Syst Biol. 2007;2007(1):16354. doi: 10.1155/2007/16354.
7
A universal deep learning approach for modeling the flow of patients under different severities.
Comput Methods Programs Biomed. 2018 Feb;154:191-203. doi: 10.1016/j.cmpb.2017.11.003. Epub 2017 Nov 7.
8
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.
Adv Bioinformatics. 2016;2016:5670851. doi: 10.1155/2016/5670851. Epub 2016 Apr 12.
9
Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.
J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.
10
A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.
Genom Data. 2016 Feb 23;8:4-15. doi: 10.1016/j.gdata.2016.02.012. eCollection 2016 Jun.

引用本文的文献

1
Exploring the modulation of MLH1 and MSH2 gene expression in hesperetin-treated breast cancer cells (BT-474).
J Adv Pharm Technol Res. 2024 Jan-Mar;15(1):43-48. doi: 10.4103/japtr.japtr_279_23. Epub 2024 Jan 15.
2
Feature selection revisited in the single-cell era.
Genome Biol. 2021 Dec 1;22(1):321. doi: 10.1186/s13059-021-02544-3.
3
Machine Learning of Discriminative Gate Locations for Clinical Diagnosis.
Cytometry A. 2020 Mar;97(3):296-307. doi: 10.1002/cyto.a.23906. Epub 2019 Nov 5.
5
Improving Gastric Cancer Outcome Prediction Using Single Time-Point Artificial Neural Network Models.
Cancer Inform. 2017 Feb 16;16:1176935116686062. doi: 10.1177/1176935116686062. eCollection 2017.

本文引用的文献

1
Epigenetically altered miR-193b targets cyclin D1 in prostate cancer.
Cancer Med. 2015 Sep;4(9):1417-25. doi: 10.1002/cam4.486. Epub 2015 Jul 1.
2
Leukemia prediction using sparse logistic regression.
PLoS One. 2013 Aug 30;8(8):e72932. doi: 10.1371/journal.pone.0072932. eCollection 2013.
3
Analysis of flow cytometry data by matrix relevance learning vector quantization.
PLoS One. 2013;8(3):e59401. doi: 10.1371/journal.pone.0059401. Epub 2013 Mar 18.
4
Critical assessment of automated flow cytometry data analysis techniques.
Nat Methods. 2013 Mar;10(3):228-38. doi: 10.1038/nmeth.2365. Epub 2013 Feb 10.
7
Cytometric fingerprinting: quantitative characterization of multivariate distributions.
Cytometry A. 2008 May;73(5):430-41. doi: 10.1002/cyto.a.20545.
8
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验