• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于贝塔分布的交叉熵用于特征选择。

Beta Distribution-Based Cross-Entropy for Feature Selection.

作者信息

Dai Weixing, Guo Dianjing

机构信息

School of Life Science and State Key Laboratory of Agrobiotechnology, G94, Science Center South Block, The Chinese University of Hong Kong, Shatin 999077, Hong Kong, China.

出版信息

Entropy (Basel). 2019 Aug 7;21(8):769. doi: 10.3390/e21080769.

DOI:10.3390/e21080769
PMID:33267482
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7515297/
Abstract

Analysis of high-dimensional data is a challenge in machine learning and data mining. Feature selection plays an important role in dealing with high-dimensional data for improvement of predictive accuracy, as well as better interpretation of the data. Frequently used evaluation functions for feature selection include resampling methods such as cross-validation, which show an advantage in predictive accuracy. However, these conventional methods are not only computationally expensive, but also tend to be over-optimistic. We propose a novel cross-entropy which is based on beta distribution for feature selection. In beta distribution-based cross-entropy (BetaDCE) for feature selection, the probability density is estimated by beta distribution and the cross-entropy is computed by the expected value of beta distribution, so that the generalization ability can be estimated more precisely than conventional methods where the probability density is learnt from data. Analysis of the generalization ability of BetaDCE revealed that it was a trade-off between bias and variance. The robustness of BetaDCE was demonstrated by experiments on three types of data. In the exclusive or-like (XOR-like) dataset, the false discovery rate of BetaDCE was significantly smaller than that of other methods. For the leukemia dataset, the area under the curve (AUC) of BetaDCE on the test set was 0.93 with only four selected features, which indicated that BetaDCE not only detected the irrelevant and redundant features precisely, but also more accurately predicted the class labels with a smaller number of features than the original method, whose AUC was 0.83 with 50 features. In the metabonomic dataset, the overall AUC of prediction with features selected by BetaDCE was significantly larger than that by the original reported method. Therefore, BetaDCE can be used as a general and efficient framework for feature selection.

摘要

高维数据的分析是机器学习和数据挖掘中的一项挑战。特征选择在处理高维数据以提高预测准确性以及更好地解释数据方面发挥着重要作用。常用于特征选择的评估函数包括重采样方法,如交叉验证,其在预测准确性方面具有优势。然而,这些传统方法不仅计算成本高,而且往往过于乐观。我们提出了一种基于贝塔分布的用于特征选择的新型交叉熵。在基于贝塔分布的交叉熵(BetaDCE)用于特征选择时,概率密度由贝塔分布估计,交叉熵由贝塔分布的期望值计算,这样与从数据中学习概率密度的传统方法相比,泛化能力可以得到更精确的估计。对BetaDCE泛化能力的分析表明,它是偏差和方差之间的一种权衡。通过对三种类型数据的实验证明了BetaDCE的鲁棒性。在异或类(XOR-like)数据集中,BetaDCE的错误发现率明显低于其他方法。对于白血病数据集,BetaDCE在测试集上仅使用四个选定特征时的曲线下面积(AUC)为0.93,这表明BetaDCE不仅能精确检测出不相关和冗余的特征,而且与原始方法相比,用更少的特征就能更准确地预测类别标签,原始方法在使用50个特征时的AUC为0.83。在代谢组学数据集中,使用BetaDCE选择的特征进行预测的总体AUC明显大于原始报道方法。因此,BetaDCE可以用作特征选择的通用且高效的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/3b8934e6890e/entropy-21-00769-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/9702b892638a/entropy-21-00769-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/b2cad7b056e2/entropy-21-00769-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/9dcc7fa5375b/entropy-21-00769-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/212837821f94/entropy-21-00769-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/fd5c1a9e4fd6/entropy-21-00769-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/93f51c32ca20/entropy-21-00769-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/618b2015df9d/entropy-21-00769-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/2676c45b9468/entropy-21-00769-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/3b8934e6890e/entropy-21-00769-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/9702b892638a/entropy-21-00769-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/b2cad7b056e2/entropy-21-00769-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/9dcc7fa5375b/entropy-21-00769-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/212837821f94/entropy-21-00769-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/fd5c1a9e4fd6/entropy-21-00769-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/93f51c32ca20/entropy-21-00769-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/618b2015df9d/entropy-21-00769-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/2676c45b9468/entropy-21-00769-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4010/7515297/3b8934e6890e/entropy-21-00769-g009.jpg

相似文献

1
Beta Distribution-Based Cross-Entropy for Feature Selection.基于贝塔分布的交叉熵用于特征选择。
Entropy (Basel). 2019 Aug 7;21(8):769. doi: 10.3390/e21080769.
2
A novel hybrid feature selection strategy in quantitative analysis of laser-induced breakdown spectroscopy.激光诱导击穿光谱定量分析中的一种新型混合特征选择策略。
Anal Chim Acta. 2019 Nov 8;1080:35-42. doi: 10.1016/j.aca.2019.07.012. Epub 2019 Jul 9.
3
Differentiation of fat-poor angiomyolipoma from clear cell renal cell carcinoma in contrast-enhanced MDCT images using quantitative feature classification.基于定量特征分类的 MDCT 增强图像鉴别乏脂性血管平滑肌脂肪瘤与透明细胞肾细胞癌
Med Phys. 2017 Jul;44(7):3604-3614. doi: 10.1002/mp.12258. Epub 2017 Jun 9.
4
Next-Generation Radiogenomics Sequencing for Prediction of EGFR and KRAS Mutation Status in NSCLC Patients Using Multimodal Imaging and Machine Learning Algorithms.使用多模态成像和机器学习算法的下一代放射基因组学测序预测非小细胞肺癌患者的EGFR和KRAS突变状态
Mol Imaging Biol. 2020 Aug;22(4):1132-1148. doi: 10.1007/s11307-020-01487-8.
5
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
6
An MLP-based feature subset selection for HIV-1 protease cleavage site analysis.基于 MLP 的 HIV-1 蛋白酶切割位点分析特征子集选择。
Artif Intell Med. 2010 Feb-Mar;48(2-3):83-9. doi: 10.1016/j.artmed.2009.07.010. Epub 2009 Nov 27.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Seminal quality prediction using data mining methods.使用数据挖掘方法进行精液质量预测。
Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816.
9
Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival.多种类型的遗传标记物的整合可能有助于提高对总体生存的预测。
Biol Direct. 2018 Sep 20;13(1):17. doi: 10.1186/s13062-018-0222-9.
10
An enhanced and efficient approach for feature selection for chronic human disease prediction: A breast cancer study.一种用于慢性人类疾病预测的特征选择的增强型高效方法:一项乳腺癌研究。
Heliyon. 2024 Feb 28;10(5):e26799. doi: 10.1016/j.heliyon.2024.e26799. eCollection 2024 Mar 15.

引用本文的文献

1
Weighted Mean Squared Deviation Feature Screening for Binary Features.二元特征的加权均方差特征筛选
Entropy (Basel). 2020 Mar 14;22(3):335. doi: 10.3390/e22030335.

本文引用的文献

1
Estimation of Economic Indicator Announced by Government From Social Big Data.基于社会大数据的政府经济指标估算
Entropy (Basel). 2018 Nov 6;20(11):852. doi: 10.3390/e20110852.
2
Relief-based feature selection: Introduction and review.基于缓解的特征选择:介绍与综述。
J Biomed Inform. 2018 Sep;85:189-203. doi: 10.1016/j.jbi.2018.07.014. Epub 2018 Jul 18.
3
A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.应用于微阵列数据的特征选择与特征提取方法综述
Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.
4
Detection of subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on eigenbrain and machine learning.基于特征脑和机器学习,利用三维磁共振成像扫描检测与阿尔茨海默病相关的受试者和脑区。
Front Comput Neurosci. 2015 Jun 2;9:66. doi: 10.3389/fncom.2015.00066. eCollection 2015.
5
Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis.基于 Lasso 组合分析的特征相关性评分及其在淋巴瘤诊断中的应用。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S14. doi: 10.1186/1471-2164-14-S1-S14. Epub 2013 Jan 21.
6
A metabonomic approach to chemosensitivity prediction of cisplatin plus 5-fluorouracil in a human xenograft model of gastric cancer.一种基于代谢组学的方法,用于预测顺铂加 5-氟尿嘧啶在胃癌人异种移植模型中的化疗敏感性。
Int J Cancer. 2010 Dec 15;127(12):2841-50. doi: 10.1002/ijc.25294.
7
Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.MAQC-II 乳腺癌和多发性骨髓瘤基因表达数据的特征选择和分类。
PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.
8
Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis.用于微阵列表达数据分析的两阶段支持向量机-递归特征消除基因选择策略的开发。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):365-81. doi: 10.1109/TCBB.2007.70224.
9
Selection bias in gene extraction on the basis of microarray gene-expression data.基于微阵列基因表达数据进行基因提取时的选择偏倚。
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6562-6. doi: 10.1073/pnas.102102699. Epub 2002 Apr 30.
10
New feature subset selection procedures for classification of expression profiles.用于表达谱分类的新特征子集选择程序。
Genome Biol. 2002;3(4):RESEARCH0017. doi: 10.1186/gb-2002-3-4-research0017. Epub 2002 Mar 14.