Fried Matthew, Wang Honggang, Fang Hua
Yeshiva University, New York, USA.
University of Massachusetts Dartmouth and Chan Medical School, Dartmouth, USA.
Proc IEEE Int Conf Big Data. 2024 Dec;2024:700-708. doi: 10.1109/bigdata62323.2024.10825719. Epub 2025 Jan 16.
Learning from massive amounts of domain-specific information requires new algorithms and models for parsing the ever-expanding field of big data. Such algorithms for exploring and identifying key features in vast databases require analysis of complex interactions to uncover critical features under a variety of circumstances. We study a comprehensive collection of health-related data, showing that our novel Choquet Integral activation function for deep neural networks transforms high-dimensional data into simpler sub-feature sets that better model complex interactions. While standard methods account for unitary feature tracking, they do not extend to multiple feature subsets, an impactful and necessary knowledge base. To this end, our novel activation function creates a sub-additive tool that better considers the weighted compilation of features within a robust set of standard benchmarks, advancing the synergistic and antagonistic relationships among features, capturing non-linear dependencies. We present the theoretical underpinnings, highlighting balanced fuzzy measures and sub-additivity for an optimized model based on real-world health data targeting weight loss. We further test different model settings, akin to hyper-parameter optimization. Despite computational time consumption, which could be improved via nowadays more powerful computing units, this novel method can be implemented as a pre-trained model using big data to identify heretofore unknown sub-additive feature interactions in a variety of fields such as biomedicine, fraud detection, cyber-security, and finance.
从大量特定领域的信息中学习需要新的算法和模型来解析不断扩展的大数据领域。这种用于在庞大数据库中探索和识别关键特征的算法需要分析复杂的相互作用,以在各种情况下发现关键特征。我们研究了一组全面的健康相关数据,结果表明,我们为深度神经网络设计的新颖的Choquet积分激活函数将高维数据转换为更简单的子特征集,能更好地对复杂相互作用进行建模。虽然标准方法考虑单一特征跟踪,但它们无法扩展到多个特征子集,而这是一个有影响力且必要的知识库。为此,我们新颖的激活函数创建了一个次可加性工具,该工具在一组强大的标准基准中能更好地考虑特征的加权组合,推进特征之间的协同和拮抗关系,捕捉非线性依赖关系。我们阐述了理论基础,强调基于针对减肥的真实世界健康数据的优化模型的平衡模糊测度和次可加性。我们进一步测试了不同的模型设置,类似于超参数优化。尽管存在计算时间消耗问题(如今可通过更强大的计算单元加以改善),但这种新颖的方法可以作为预训练模型来实现,利用大数据识别生物医学、欺诈检测、网络安全和金融等各个领域中迄今未知的次可加性特征相互作用。