基于距离相关的分布式特征选择算法及其在微阵列中的应用。

A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1802-1815. doi: 10.1109/TCBB.2018.2833482. Epub 2018 May 9.

DOI:10.1109/TCBB.2018.2833482

Abstract

DNA microarray datasets are characterized by a large number of features with very few samples, which is a typical cause of overfitting and poor generalization in the classification task. Here, we introduce a novel feature selection (FS) approach which employs the distance correlation (dCor) as a criterion for evaluating the dependence of the class on a given feature subset. The dCor index provides a reliable dependence measure among random vectors of arbitrary dimension, without any assumption on their distribution. Moreover, it is sensitive to the presence of redundant terms. The proposed FS method is based on a probabilistic representation of the feature subset model, which is progressively refined by a repeated process of model extraction and evaluation. A key element of the approach is a distributed optimization scheme based on a vertical partitioning of the dataset, which alleviates the negative effects of its unbalanced dimensions. The proposed method has been tested on several microarray datasets, resulting in quite compact and accurate models obtained at a reasonable computational cost.

摘要

DNA 微阵列数据集的特点是特征数量非常多，而样本数量非常少，这是分类任务中过度拟合和泛化能力差的一个典型原因。在这里，我们引入了一种新的特征选择（FS）方法，该方法使用距离相关（dCor）作为评估给定特征子集与类之间依赖关系的标准。dCor 指数提供了一种可靠的依赖度量，适用于任意维度的随机向量，而无需对其分布做出任何假设。此外，它对冗余项的存在很敏感。所提出的 FS 方法基于特征子集模型的概率表示，该模型通过重复的模型提取和评估过程进行逐步细化。该方法的一个关键要素是基于数据集垂直划分的分布式优化方案，它减轻了其不平衡维度的负面影响。该方法已经在多个微阵列数据集上进行了测试，结果得到了相当紧凑和准确的模型，同时计算成本也合理。

相似文献

A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays.基于距离相关的分布式特征选择算法及其在微阵列中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1802-1815. doi: 10.1109/TCBB.2018.2833482. Epub 2018 May 9.

A centroid-based gene selection method for microarray data classification.一种基于质心的微阵列数据分类基因选择方法。

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

An experimental comparison of feature selection methods on two-class biomedical datasets.两类生物医学数据集上特征选择方法的实验比较。

Comput Biol Med. 2015 Nov 1;66:1-10. doi: 10.1016/j.compbiomed.2015.08.010. Epub 2015 Aug 24.

A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification.基于 ReliefF 和蚁群优化算法的混合基因选择方法在肿瘤分类中的应用。

Sci Rep. 2019 Jun 20;9(1):8978. doi: 10.1038/s41598-019-45223-x.

A granular computing approach to gene selection.一种用于基因选择的粒度计算方法。

Biomed Mater Eng. 2014;24(1):1307-14. doi: 10.3233/BME-130933.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.C-HMOSHSSA：使用多目标元启发式和机器学习方法进行癌症分类的基因选择。

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

Gene selection for tumor classification using a novel bio-inspired multi-objective approach.基于新型生物启发式多目标方法的肿瘤分类基因选择。

Genomics. 2018 Jan;110(1):10-17. doi: 10.1016/j.ygeno.2017.07.010. Epub 2017 Aug 3.

The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Feature selection and classifier performance on diverse bio- logical datasets.不同生物数据集上的特征选择与分类器性能

BMC Bioinformatics. 2014;15 Suppl 13(Suppl 13):S4. doi: 10.1186/1471-2105-15-S13-S4. Epub 2014 Nov 13.

Detecting biomarkers from microarray data using distributed correlation based gene selection.基于分布式相关的基因选择从微阵列数据中检测生物标志物。

Genes Genomics. 2020 Apr;42(4):449-465. doi: 10.1007/s13258-020-00916-w. Epub 2020 Feb 10.

引用本文的文献

Cancer classification in high dimensional microarray gene expressions by feature selection using eagle prey optimization.基于鹰猎物优化特征选择的高维微阵列基因表达中的癌症分类

Front Genet. 2025 Mar 21;16:1528810. doi: 10.3389/fgene.2025.1528810. eCollection 2025.

Radiomics based on multiple machine learning methods for diagnosing early bone metastases not visible on CT images.基于多种机器学习方法的放射组学用于诊断CT图像上不可见的早期骨转移。

Skeletal Radiol. 2025 Feb;54(2):335-343. doi: 10.1007/s00256-024-04752-x. Epub 2024 Jul 19.

PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection.PPIGCF：一种基于蛋白质相互作用的基因关联滤波器，用于最优基因选择。

Genes (Basel). 2023 May 10;14(5):1063. doi: 10.3390/genes14051063.

Use of radiomics based on F-FDG PET/CT and machine learning methods to aid clinical decision-making in the classification of solitary pulmonary lesions: an innovative approach.基于 F-FDG PET/CT 和机器学习方法的影像组学在孤立性肺病变分类中辅助临床决策：一种创新方法。

Eur J Nucl Med Mol Imaging. 2021 Aug;48(9):2904-2913. doi: 10.1007/s00259-021-05220-7. Epub 2021 Feb 5.

Paperboard Coating Detection Based on Full-Stokes Imaging Polarimetry.基于全斯托克斯成像偏振光度法的纸板涂层检测。

Sensors (Basel). 2020 Dec 31;21(1):208. doi: 10.3390/s21010208.

An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets.基于高效混合过滤-包装元启发式算法的高维数据集基因选择方法。

Sci Rep. 2019 Dec 9;9(1):18580. doi: 10.1038/s41598-019-54987-1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于距离相关的分布式特征选择算法及其在微阵列中的应用。

A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献