对于生物医学特征选择和分类问题，存在多种效果相似的解决方案。

Multiple similarly effective solutions exist for biomedical feature selection and classification problems.

作者信息

Liu Jiamei, Xu Cheng, Yang Weifeng, Shu Yayun, Zheng Weiwei, Zhou Fengfeng

机构信息

College of Software, Jilin University, Changchun, Jilin, 130012, China.

College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.

出版信息

Sci Rep. 2017 Oct 9;7(1):12830. doi: 10.1038/s41598-017-13184-8.

DOI:10.1038/s41598-017-13184-8

PMID:28993656

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5634418/

Abstract

Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.

摘要

二元分类是一个广泛应用的问题，用于辅助决策各种生物医学大数据问题，例如治疗参与者与对照组之间的临床药物试验，以及有或无某种表型的参与者之间的全基因组关联研究（GWAS）。为此目的，通过优化区分两组样本的能力来训练机器学习模型。然而，大多数分类算法倾向于根据输入数据集和数据集的数学假设生成一个局部最优解。在这里，我们从疾病分类和特征选择两个方面证明，多个不同的解可能具有相似的分类性能。因此，现有的机器学习算法可能只钓到了一条好鱼，却忽略了一大群鱼。由于大多数现有的机器学习算法通过优化一个数学目标来生成一个解，通过考虑生成的解和被忽略的解来理解所研究分类问题的生物学机制可能至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c61/5634418/3d5e5c5fce8c/41598_2017_13184_Fig1_HTML.jpg

相似文献

Sci Rep. 2017 Oct 9;7(1):12830. doi: 10.1038/s41598-017-13184-8.

A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

RIFS: a randomly restarted incremental feature selection algorithm.RIFS：一种随机重启的增量特征选择算法。

Sci Rep. 2017 Oct 12;7(1):13013. doi: 10.1038/s41598-017-13259-6.

Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach.在分类的特征选择问题中识别（准）等信息量子集：一种最大相关性最小冗余方法。

IEEE Trans Cybern. 2016 Jun;46(6):1424-37. doi: 10.1109/TCYB.2015.2444435. Epub 2015 Jul 6.

Constraint programming based biomarker optimization.基于约束编程的生物标志物优化

Biomed Res Int. 2015;2015:910515. doi: 10.1155/2015/910515. Epub 2015 May 5.

Input feature selection for classification problems.用于分类问题的输入特征选择。

IEEE Trans Neural Netw. 2002;13(1):143-59. doi: 10.1109/72.977291.

A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。

Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

The relevance of feature selection methods to the classification of obsessive-compulsive disorder based on volumetric measures.基于容积测量的强迫症分类中特征选择方法的相关性。

J Affect Disord. 2017 Nov;222:49-56. doi: 10.1016/j.jad.2017.06.061. Epub 2017 Jun 27.

fastJT: An R package for robust and efficient feature selection for machine learning and genome-wide association studies.fastJT：一个用于机器学习和全基因组关联研究的稳健、高效的特征选择的 R 包。

BMC Bioinformatics. 2019 Jun 13;20(1):333. doi: 10.1186/s12859-019-2869-3.

引用本文的文献

MuscNet, a Weighted Voting Model of Multi-Source Connectivity Networks to Predict Mild Cognitive Impairment Using Resting-State Functional MRI.MuscNet，一种用于利用静息态功能磁共振成像预测轻度认知障碍的多源连接网络加权投票模型。

IEEE Access. 2020;8:174023-174031. doi: 10.1109/access.2020.3025828. Epub 2020 Sep 22.

Feature Selection of OMIC Data by Ensemble Swarm Intelligence Based Approaches.基于集成群体智能方法的组学数据特征选择

Front Genet. 2022 Mar 8;12:793629. doi: 10.3389/fgene.2021.793629. eCollection 2021.

A Machine Learning-Based Investigation of Gender-Specific Prognosis of Lung Cancers.基于机器学习的肺癌性别特异性预后研究。

Medicina (Kaunas). 2021 Jan 22;57(2):99. doi: 10.3390/medicina57020099.

Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers.年龄对于基于转录组学和甲基组学生物标志物的乳腺癌早期检测至关重要。

Front Genet. 2019 Mar 26;10:212. doi: 10.3389/fgene.2019.00212. eCollection 2019.

Robust clinical marker identification for diabetic kidney disease with ensemble feature selection.基于集成特征选择的糖尿病肾病稳健临床标志物识别。

J Am Med Inform Assoc. 2019 Mar 1;26(3):242-253. doi: 10.1093/jamia/ocy165.

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances.选择具有相似有效二元分类性能的多个生物标志物子集。

J Vis Exp. 2018 Oct 11(140):57738. doi: 10.3791/57738.

Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis.基于机器学习的毒性预测：从化学结构描述到转录组分析。

Int J Mol Sci. 2018 Aug 10;19(8):2358. doi: 10.3390/ijms19082358.

Semi-Supervised Maximum Discriminative Local Margin for Gene Selection.半监督最大鉴别局部边缘的基因选择。

Sci Rep. 2018 Jun 5;8(1):8619. doi: 10.1038/s41598-018-26806-6.

本文引用的文献

A microRNA biomarker panel for the non-invasive detection of bladder cancer.用于膀胱癌无创检测的微小RNA生物标志物组合

Oncotarget. 2016 Dec 27;7(52):86290-86299. doi: 10.18632/oncotarget.13382.

Continuous leaf optimization for IMRT leaf sequencing.用于调强放疗叶片排序的连续叶片优化

Med Phys. 2016 Oct;43(10):5403. doi: 10.1118/1.4962030.

Efficacy of curcumin, and a saffron/curcumin combination for the treatment of major depression: A randomised, double-blind, placebo-controlled study.姜黄素以及藏红花/姜黄素组合治疗重度抑郁症的疗效：一项随机、双盲、安慰剂对照研究。

J Affect Disord. 2017 Jan 1;207:188-196. doi: 10.1016/j.jad.2016.09.047. Epub 2016 Oct 1.

Genome wide association study (GWAS) for grain yield in rice cultivated under water deficit.水分亏缺条件下栽培水稻产量的全基因组关联研究（GWAS）

Genetica. 2016 Dec;144(6):651-664. doi: 10.1007/s10709-016-9932-z. Epub 2016 Oct 8.

Large-scale microarray profiling reveals four stages of immune escape in non-Hodgkin lymphomas.大规模基因芯片分析揭示非霍奇金淋巴瘤免疫逃逸的四个阶段。

Oncoimmunology. 2016 May 19;5(7):e1188246. doi: 10.1080/2162402X.2016.1188246. eCollection 2016 Jul.

Genetic variants in the PIWI-piRNA pathway gene DCP1A predict melanoma disease-specific survival.PIWI- piRNA途径基因DCP1A中的遗传变异可预测黑色素瘤疾病特异性生存率。

Int J Cancer. 2016 Dec 15;139(12):2730-2737. doi: 10.1002/ijc.30409. Epub 2016 Sep 14.

Risk factors for progression in children and young adults with IgA nephropathy: an analysis of 261 cases from the VALIGA European cohort.儿童和青年IgA肾病进展的危险因素：来自VALIGA欧洲队列的261例病例分析

Pediatr Nephrol. 2017 Jan;32(1):139-150. doi: 10.1007/s00467-016-3469-3. Epub 2016 Aug 25.

Lagrange Programming Neural Network for Nondifferentiable Optimization Problems in Sparse Approximation.拉格朗日规划神经网络在稀疏逼近中非可微优化问题中的应用。

IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2395-2407. doi: 10.1109/TNNLS.2016.2575860. Epub 2016 Jul 27.

Binary classification SVM-based algorithms with interval-valued training data using triangular and Epanechnikov kernels.基于支持向量机的二分类算法，使用三角核和埃帕涅尼科夫核处理区间值训练数据。

Neural Netw. 2016 Aug;80:53-66. doi: 10.1016/j.neunet.2016.04.005. Epub 2016 Apr 27.

McTwo: a two-step feature selection algorithm based on maximal information coefficient.McTwo：一种基于最大信息系数的两步特征选择算法。

BMC Bioinformatics. 2016 Mar 23;17:142. doi: 10.1186/s12859-016-0990-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对于生物医学特征选择和分类问题，存在多种效果相似的解决方案。

Multiple similarly effective solutions exist for biomedical feature selection and classification problems.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献