• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

流模型和滑动窗口模型中的公平最大最小多样性最大化

Fair Max-Min Diversity Maximization in Streaming and Sliding-Window Models.

作者信息

Wang Yanhao, Fabbri Francesco, Mathioudakis Michael, Li Jia

机构信息

School of Data Science and Engineering, East China Normal University, Shanghai 200062, China.

Spotify, 08000 Barcelona, Spain.

出版信息

Entropy (Basel). 2023 Jul 14;25(7):1066. doi: 10.3390/e25071066.

DOI:10.3390/e25071066
PMID:37510013
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10378839/
Abstract

Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set of elements, the problem asks for a subset of k≪n elements with maximum diversity, as quantified by the dissimilarities among the elements in . In this paper, we study diversity maximization with fairness constraints in streaming and sliding-window models. Specifically, we focus on the max-min diversity maximization problem, which selects a subset that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set is partitioned into disjoint groups by a specific sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset contains ki elements from each group i∈[m]. Although diversity maximization has been extensively studied, existing algorithms for fair max-min diversity maximization are inefficient for data streams. To address the problem, we first design efficient approximation algorithms for this problem in the (insert-only) streaming model, where data arrive one element at a time, and a solution should be computed based on the elements observed in one pass. Furthermore, we propose approximation algorithms for this problem in the sliding-window model, where only the latest elements in the stream are considered for computation to capture the recency of the data. Experimental results on real-world and synthetic datasets show that our algorithms provide solutions of comparable quality to the state-of-the-art offline algorithms while running several orders of magnitude faster in the streaming and sliding-window settings.

摘要

多样性最大化是一个在数据汇总、网络搜索和推荐系统等领域有广泛应用的基本问题。给定一组元素,该问题要求找出一个包含k≪n个元素的子集,其具有最大的多样性,这是通过子集中元素之间的差异来量化的。在本文中,我们研究了流模型和滑动窗口模型中带有公平性约束的多样性最大化问题。具体来说,我们关注最大最小多样性最大化问题,即选择一个子集,使得子集中任意两个不同元素之间的最小距离(差异)最大化。假设集合由一个特定的敏感属性(例如性别或种族)划分为m个不相交的组,确保公平性要求所选子集在每个组i∈[m]中包含ki个元素。尽管多样性最大化已经得到了广泛研究,但现有的公平最大最小多样性最大化算法对于数据流来说效率低下。为了解决这个问题,我们首先在(仅插入)流模型中为该问题设计了高效的近似算法,在这种模型中数据一次到达一个元素,并且应该基于一次遍历中观察到的元素来计算解决方案。此外,我们还为滑动窗口模型中的这个问题提出了近似算法,在该模型中,仅考虑流中最新的元素进行计算,以捕捉数据的时效性。在真实世界和合成数据集上的实验结果表明,我们的算法在流模型和滑动窗口设置下运行速度比最先进的离线算法快几个数量级的同时,还能提供质量相当的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/13a5e4f01115/entropy-25-01066-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/78572eb9c350/entropy-25-01066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/767ec8b495cc/entropy-25-01066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/53f53eff3427/entropy-25-01066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/12333a946eb8/entropy-25-01066-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/7cf35cf92ad3/entropy-25-01066-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/e574d2d3a8ea/entropy-25-01066-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/a6e03951b1fa/entropy-25-01066-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/2750c1082fc2/entropy-25-01066-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/96426e87e10e/entropy-25-01066-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/ceb0511426ab/entropy-25-01066-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/346a004837c7/entropy-25-01066-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/9c87da1ded6b/entropy-25-01066-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/13a5e4f01115/entropy-25-01066-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/78572eb9c350/entropy-25-01066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/767ec8b495cc/entropy-25-01066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/53f53eff3427/entropy-25-01066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/12333a946eb8/entropy-25-01066-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/7cf35cf92ad3/entropy-25-01066-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/e574d2d3a8ea/entropy-25-01066-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/a6e03951b1fa/entropy-25-01066-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/2750c1082fc2/entropy-25-01066-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/96426e87e10e/entropy-25-01066-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/ceb0511426ab/entropy-25-01066-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/346a004837c7/entropy-25-01066-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/9c87da1ded6b/entropy-25-01066-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf81/10378839/13a5e4f01115/entropy-25-01066-g013.jpg

相似文献

1
Fair Max-Min Diversity Maximization in Streaming and Sliding-Window Models.流模型和滑动窗口模型中的公平最大最小多样性最大化
Entropy (Basel). 2023 Jul 14;25(7):1066. doi: 10.3390/e25071066.
2
Near Optimal Linear Algebra in the Online and Sliding Window Models.在线和滑动窗口模型中的近似最优线性代数
Proc Annu Symp Found Comput Sci. 2020;1:517-528.
3
Designing a Streaming Algorithm for Outlier Detection in Data Mining-An Incrementa Approach.设计一种用于数据挖掘中异常值检测的流算法-一种增量方法。
Sensors (Basel). 2020 Feb 26;20(5):1261. doi: 10.3390/s20051261.
4
The k partition-distance problem.k划分距离问题。
J Comput Biol. 2012 Apr;19(4):404-17. doi: 10.1089/cmb.2010.0186.
5
Estimating Multilevel Models on Data Streams.在数据流上估计多层模型。
Psychometrika. 2019 Mar;84(1):41-64. doi: 10.1007/s11336-018-09656-z. Epub 2019 Jan 22.
6
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.
7
Efficient Minimum Cost Seed Selection With Theoretical Guarantees for Competitive Influence Maximization.高效最小成本种子选择,具有竞争影响力最大化的理论保证。
IEEE Trans Cybern. 2021 Dec;51(12):6091-6104. doi: 10.1109/TCYB.2020.2966593. Epub 2021 Dec 22.
8
A Streaming model for Generalized Rayleigh with extension to Minimum Noise Fraction.一种扩展到最小噪声分数的广义瑞利流模型。
Proc IEEE Int Conf Big Data. 2019 Dec;2019:74-83. doi: 10.1109/BigData47090.2019.9006512. Epub 2020 Feb 24.
9
Polarity related influence maximization in signed social networks.带符号社交网络中与极性相关的影响力最大化
PLoS One. 2014 Jul 25;9(7):e102199. doi: 10.1371/journal.pone.0102199. eCollection 2014.
10
Ranking with submodular functions on a budget.预算约束下基于次模函数的排序
Data Min Knowl Discov. 2022;36(3):1197-1218. doi: 10.1007/s10618-022-00833-4. Epub 2022 Apr 23.

引用本文的文献

1
Employing machine learning for early detection of poly-victimization in rural children: a survey study in China's Chaoshan region.运用机器学习对农村儿童多重受害情况进行早期检测:中国潮汕地区的一项调查研究
BMC Public Health. 2025 Jul 5;25(1):2391. doi: 10.1186/s12889-025-23610-6.