通过一种基于新Choquet积分的深度神经网络进行大数据中的特征交互检测

Feature Interaction Detection in Big Data Through a New Choquet Integral based Deep Neural Network.

作者信息

Fried Matthew, Wang Honggang, Fang Hua

机构信息

Yeshiva University, New York, USA.

University of Massachusetts Dartmouth and Chan Medical School, Dartmouth, USA.

出版信息

Proc IEEE Int Conf Big Data. 2024 Dec;2024:700-708. doi: 10.1109/bigdata62323.2024.10825719. Epub 2025 Jan 16.

DOI:10.1109/bigdata62323.2024.10825719

PMID:40291853

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12033041/

Abstract

Learning from massive amounts of domain-specific information requires new algorithms and models for parsing the ever-expanding field of big data. Such algorithms for exploring and identifying key features in vast databases require analysis of complex interactions to uncover critical features under a variety of circumstances. We study a comprehensive collection of health-related data, showing that our novel Choquet Integral activation function for deep neural networks transforms high-dimensional data into simpler sub-feature sets that better model complex interactions. While standard methods account for unitary feature tracking, they do not extend to multiple feature subsets, an impactful and necessary knowledge base. To this end, our novel activation function creates a sub-additive tool that better considers the weighted compilation of features within a robust set of standard benchmarks, advancing the synergistic and antagonistic relationships among features, capturing non-linear dependencies. We present the theoretical underpinnings, highlighting balanced fuzzy measures and sub-additivity for an optimized model based on real-world health data targeting weight loss. We further test different model settings, akin to hyper-parameter optimization. Despite computational time consumption, which could be improved via nowadays more powerful computing units, this novel method can be implemented as a pre-trained model using big data to identify heretofore unknown sub-additive feature interactions in a variety of fields such as biomedicine, fraud detection, cyber-security, and finance.

摘要

从大量特定领域的信息中学习需要新的算法和模型来解析不断扩展的大数据领域。这种用于在庞大数据库中探索和识别关键特征的算法需要分析复杂的相互作用，以在各种情况下发现关键特征。我们研究了一组全面的健康相关数据，结果表明，我们为深度神经网络设计的新颖的Choquet积分激活函数将高维数据转换为更简单的子特征集，能更好地对复杂相互作用进行建模。虽然标准方法考虑单一特征跟踪，但它们无法扩展到多个特征子集，而这是一个有影响力且必要的知识库。为此，我们新颖的激活函数创建了一个次可加性工具，该工具在一组强大的标准基准中能更好地考虑特征的加权组合，推进特征之间的协同和拮抗关系，捕捉非线性依赖关系。我们阐述了理论基础，强调基于针对减肥的真实世界健康数据的优化模型的平衡模糊测度和次可加性。我们进一步测试了不同的模型设置，类似于超参数优化。尽管存在计算时间消耗问题（如今可通过更强大的计算单元加以改善），但这种新颖的方法可以作为预训练模型来实现，利用大数据识别生物医学、欺诈检测、网络安全和金融等各个领域中迄今未知的次可加性特征相互作用。

相似文献

Feature Interaction Detection in Big Data Through a New Choquet Integral based Deep Neural Network.通过一种基于新Choquet积分的深度神经网络进行大数据中的特征交互检测

Proc IEEE Int Conf Big Data. 2024 Dec;2024:700-708. doi: 10.1109/bigdata62323.2024.10825719. Epub 2025 Jan 16.

Choquet fuzzy integral-based classifier ensemble technique for COVID-19 detection.基于 Choquet 模糊积分的 COVID-19 检测分类器集成技术。

Comput Biol Med. 2021 Aug;135:104585. doi: 10.1016/j.compbiomed.2021.104585. Epub 2021 Jun 22.

Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs).基于Choquet积分的模糊分子表征：当根据原子/键贡献之间的依赖性（局部重叠价指数/局部重叠电子指数）计算全局定义时。

J Cheminform. 2018 Oct 25;10(1):51. doi: 10.1186/s13321-018-0306-7.

CFI-Net: A Choquet Fuzzy Integral Based Ensemble Network With PSO-Optimized Fuzzy Measures for Diagnosing Multiple Skin Diseases Including Mpox.CFI-Net：一种基于 Choquet 模糊积分的集成网络，具有基于 PSO 优化的模糊测度，用于诊断包括猴痘在内的多种皮肤病。

IEEE J Biomed Health Inform. 2024 Sep;28(9):5573-5586. doi: 10.1109/JBHI.2024.3411658. Epub 2024 Sep 5.

Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Recognition of Electroencephalography-Related Features of Neuronal Network Organization in Patients With Schizophrenia Using the Generalized Choquet Integrals.使用广义Choquet积分识别精神分裂症患者神经网络组织中与脑电图相关的特征

Front Neuroinform. 2021 Dec 14;15:744355. doi: 10.3389/fninf.2021.744355. eCollection 2021.

New horizon in fuzzy distributions: statistical distributions in continuous domains generated by Choquet integral.模糊分布的新视野：由Choquet积分生成的连续域中的统计分布。

Soft comput. 2023 Jun 2:1-10. doi: 10.1007/s00500-023-08529-7.

A multimodal convolutional neuro-fuzzy network for emotion understanding of movie clips.用于电影片段情绪理解的多模态卷积神经模糊网络。

Neural Netw. 2019 Oct;118:208-219. doi: 10.1016/j.neunet.2019.06.010. Epub 2019 Jul 2.

A 2-order additive fuzzy measure identification method based on hesitant fuzzy linguistic interaction degree and its application in credit assessment.一种基于犹豫模糊语言交互度的二阶加性模糊测度识别方法及其在信用评估中的应用

Sci Rep. 2024 Apr 14;14(1):8617. doi: 10.1038/s41598-024-58919-6.

本文引用的文献

Distinct factors associated with short-term and long-term weight loss induced by low-fat or low-carbohydrate diet intervention.低脂或低碳水化合物饮食干预诱导的短期和长期体重减轻相关的不同因素。

Cell Rep Med. 2022 Dec 20;3(12):100870. doi: 10.1016/j.xcrm.2022.100870. Epub 2022 Dec 13.

A review of harmonization methods for studying dietary patterns.饮食模式研究的协调方法综述

Smart Health (Amst). 2022 Mar;23. doi: 10.1016/j.smhl.2021.100263. Epub 2022 Jan 13.

Factors affecting weight loss variability in obesity.影响肥胖体重变化的因素。

Metabolism. 2020 Dec;113:154388. doi: 10.1016/j.metabol.2020.154388. Epub 2020 Oct 7.

Delineating the psychological and behavioural factors of successful weight loss maintenance.明确成功维持体重减轻的心理和行为因素。

Heliyon. 2019 Dec 31;6(1):e03100. doi: 10.1016/j.heliyon.2019.e03100. eCollection 2020 Jan.

An Enhanced Visualization Method to Aid Behavioral Trajectory Pattern Recognition Infrastructure for Big Longitudinal Data.一种用于辅助大纵向数据行为轨迹模式识别基础设施的增强可视化方法。

IEEE Trans Big Data. 2018 Jun;4(2):289-298. doi: 10.1109/TBDATA.2017.2653815. Epub 2017 Jan 16.

Multiple- vs Non- or Single-Imputation based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data.基于多重插补与非插补或单重插补的模糊聚类方法处理不完全纵向行为干预数据

IEEE Int Conf Connect Health Appl Syst Eng Technol. 2016 Jun;2016:219-228. doi: 10.1109/CHASE.2016.19. Epub 2016 Aug 18.

MIFuzzy Clustering for Incomplete Longitudinal Data in Smart Health.智能健康中不完整纵向数据的MIFuzzy聚类

Smart Health (Amst). 2017 Jun;1-2:50-65. doi: 10.1016/j.smhl.2017.04.002. Epub 2017 Apr 27.

Acculturation, Depression, and Smoking Cessation: a trajectory pattern recognition approach.文化适应、抑郁与戒烟：一种轨迹模式识别方法。

Tob Induc Dis. 2017 Jul 24;15:33. doi: 10.1186/s12971-017-0135-x. eCollection 2017.

A New MI-Based Visualization Aided Validation Index for Mining Big Longitudinal Web Trial Data.一种基于互信息的可视化辅助验证指标，用于挖掘大型纵向网络试验数据

IEEE Access. 2016;4:2272-2280. doi: 10.1109/ACCESS.2016.2569074. Epub 2016 May 16.

Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.用于电子健康领域中存在缺失值的大型纵向试验数据的基于多重填补的聚类验证（MIV）

J Med Syst. 2016 Jun;40(6):146. doi: 10.1007/s10916-016-0499-0. Epub 2016 Apr 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验