文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

McTwo:一种基于最大信息系数的两步特征选择算法。

McTwo: a two-step feature selection algorithm based on maximal information coefficient.

作者信息

Ge Ruiquan, Zhou Manli, Luo Youxi, Meng Qinghan, Mai Guoqin, Ma Dongli, Wang Guoqing, Zhou Fengfeng

机构信息

Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China.

Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China.

出版信息

BMC Bioinformatics. 2016 Mar 23;17:142. doi: 10.1186/s12859-016-0990-0.


DOI:10.1186/s12859-016-0990-0
PMID:27006077
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4804474/
Abstract

BACKGROUND: High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. RESULTS: This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. CONCLUSION: McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.

摘要

背景:高通量生物组学技术正以越来越快的速度从生物样本中产生高维数据,而由于各种困难,传统实验中的训练样本数量仍然很少。生物医学“大数据”领域的这种“高维小样本”范式至少可以部分地通过特征选择算法来解决,这些算法只选择与表型显著相关的特征。特征选择是一个NP难问题。由于寻找全局最优解的时间要求呈指数增长,所有现有的特征选择算法都采用启发式规则来寻找局部最优解,并且它们的解在不同的数据集上表现不同。 结果:这项工作描述了一种基于最近发表的相关性度量——最大信息系数(MIC)的特征选择算法。所提出的算法McTwo旨在选择与表型相关的、相互独立的特征,并实现最近邻算法的高分类性能。基于对17个数据集的比较研究,McTwo的性能与现有算法相当或更好,同时所选特征的数量显著减少。从文献来看,McTwo选择的特征似乎也与表型具有特定的生物医学相关性。 结论:McTwo选择了一个具有非常好的分类性能且特征数量少的特征子集。因此,McTwo可能代表了一种用于高维生物医学数据集的补充性特征选择算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/55032cd21ebc/12859_2016_990_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/2988de2407a9/12859_2016_990_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/145d08972a3c/12859_2016_990_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/aeb0c5bb68d1/12859_2016_990_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/2d293c91fffc/12859_2016_990_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/09f0dd4c11f3/12859_2016_990_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/670e361265f9/12859_2016_990_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/50f2f2db1f5e/12859_2016_990_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/55032cd21ebc/12859_2016_990_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/2988de2407a9/12859_2016_990_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/145d08972a3c/12859_2016_990_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/aeb0c5bb68d1/12859_2016_990_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/2d293c91fffc/12859_2016_990_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/09f0dd4c11f3/12859_2016_990_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/670e361265f9/12859_2016_990_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/50f2f2db1f5e/12859_2016_990_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7e/4804474/55032cd21ebc/12859_2016_990_Fig8_HTML.jpg

相似文献

[1]
McTwo: a two-step feature selection algorithm based on maximal information coefficient.

BMC Bioinformatics. 2016-3-23

[2]
Evolutionary Sparsity Regularisation-Based Feature Selection for Binary Classification.

Evol Comput. 2025-6-2

[3]
A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction.

Comput Intell Neurosci. 2021

[4]
Upper-Limb Motion Recognition Based on Hybrid Feature Selection: Algorithm Development and Validation.

JMIR Mhealth Uhealth. 2021-9-2

[5]
An Innovative Excited-ACS-IDGWO Algorithm for Optimal Biomedical Data Feature Selection.

Biomed Res Int. 2020

[6]
A novel feature selection approach for biomedical data classification.

J Biomed Inform. 2009-7-30

[7]
Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm.

Math Biosci Eng. 2022-9-19

[8]
Artificial Intelligence based wrapper for high dimensional feature selection.

BMC Bioinformatics. 2023-10-18

[9]
Improved intelligent water drop-based hybrid feature selection method for microarray data processing.

Comput Biol Chem. 2023-4

[10]
A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF.

IEEE J Biomed Health Inform. 2018-9-28

引用本文的文献

[1]
Dynamic Modeling of the Sulfur Cycle in Urban Sewage Pipelines Under High-Temperature and High-Salinity Conditions.

Microorganisms. 2025-6-30

[2]
An efficient, not-only-linear correlation coefficient based on clustering.

Cell Syst. 2024-9-18

[3]
A novel hybrid algorithm based on Harris Hawks for tumor feature gene selection.

PeerJ Comput Sci. 2023-2-13

[4]
A Feature Selection Method Based on Graph Theory for Cancer Classification.

Comb Chem High Throughput Screen. 2024

[5]
A new hybrid algorithm for three-stage gene selection based on whale optimization.

Sci Rep. 2023-3-7

[6]
A Two-Step Feature Selection Radiomic Approach to Predict Molecular Outcomes in Breast Cancer.

Sensors (Basel). 2023-1-31

[7]
csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames.

Brief Bioinform. 2022-11-19

[8]
Cancer Detection and Prediction Using Genetic Algorithms.

Comput Intell Neurosci. 2022

[9]
A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China.

BMC Infect Dis. 2022-5-23

[10]
i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification.

Front Genet. 2022-4-27

本文引用的文献

[1]
Supervised machine learning and active learning in classification of radiology reports.

J Am Med Inform Assoc. 2014-5-22

[2]
Review on statistical methods for gene network reconstruction using expression data.

J Theor Biol. 2014-12-7

[3]
A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.

Bioinformatics. 2014-7-1

[4]
Gene expression profile based classification models of psoriasis.

Genomics. 2013-11-13

[5]
Comparison of global gene expression of gastric cardia and noncardia cancers from a high-risk population in china.

PLoS One. 2013-5-22

[6]
Assessing genome-wide statistical significance for large p small n problems.

Genetics. 2013-5-11

[7]
VCP phosphorylation-dependent interaction partners prevent apoptosis in Helicobacter pylori-infected gastric epithelial cells.

PLoS One. 2013-1-31

[8]
Transcriptional signatures as a disease-specific and predictive inflammatory biomarker for type 1 diabetes.

Genes Immun. 2012-9-13

[9]
Comprehensive genomic meta-analysis identifies intra-tumoural stroma as a predictor of survival in patients with gastric cancer.

Gut. 2012-6-26

[10]
TTC7B emerges as a novel risk factor for ischemic stroke through the convergence of several genome-wide approaches.

J Cereb Blood Flow Metab. 2012-3-28

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索