• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过分布可学习性对分布偏移下的学习数据库操作进行理论分析。

Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability.

作者信息

Zeighami Sepanta, Shahabi Cyrus

机构信息

University of California, Berkeley. Work done while at USC's Infolab.

University of Southern California.

出版信息

Proc Mach Learn Res. 2024 Jul;235:58283-58305.

PMID:39498334
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11534081/
Abstract

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.

摘要

机器学习用于执行数据库操作(如索引、基数估计和排序)已被证明能带来显著的性能提升。然而,当数据集发生变化且数据分布发生偏移时,实证结果也表明,已学习模型的性能会下降,甚至可能比未学习的替代方法更差。这一点,再加上对已学习方法缺乏理论理解,削弱了它们的实际适用性,因为无法保证模型在部署后的性能表现。在本文中,我们针对上述操作,给出了动态数据集中已学习模型性能的首个已知理论特征。我们的结果展示了已学习模型可实现的新颖理论特征,并给出了模型性能的界限,这些界限刻画了它们相对于未学习方法的优势,说明了已学习模型能够超越替代方法的原因和时机。我们的分析构建了框架和新颖的理论工具,为未来已学习数据库操作的分析奠定了基础。

相似文献

1
Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability.通过分布可学习性对分布偏移下的学习数据库操作进行理论分析。
Proc Mach Learn Res. 2024 Jul;235:58283-58305.
2
On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing.关于学习索引的分布依赖子对数查询时间
Proc Mach Learn Res. 2023 Jul;202:40669-40680.
3
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
Int J Med Inform. 2023 May;173:104930. doi: 10.1016/j.ijmedinf.2022.104930. Epub 2022 Nov 19.
4
The probabilistic analysis of language acquisition: theoretical, computational, and experimental analysis.语言习得的概率分析:理论、计算和实验分析。
Cognition. 2011 Sep;120(3):380-90. doi: 10.1016/j.cognition.2011.02.013. Epub 2011 Mar 26.
5
NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks.NeuroSketch:使用神经网络对范围聚合查询进行快速近似评估。
Proc ACM Manag Data. 2023 May;1(1). doi: 10.1145/3588954. Epub 2023 May 30.
6
Data structure set-trie for storing and querying sets: Theoretical and empirical analysis.用于存储和查询集合的数据结构集合前缀树:理论与实证分析。
PLoS One. 2021 Feb 10;16(2):e0245122. doi: 10.1371/journal.pone.0245122. eCollection 2021.
7
A Cardinality Estimator in Complex Database Systems Based on TreeLSTM.
Sensors (Basel). 2023 Aug 23;23(17):7364. doi: 10.3390/s23177364.
8
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
medRxiv. 2022 Jun 7:2022.06.06.22276062. doi: 10.1101/2022.06.06.22276062.
9
Efficient Offline Reinforcement Learning With Relaxed Conservatism.基于松弛保守主义的高效离线强化学习
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5260-5272. doi: 10.1109/TPAMI.2024.3364844. Epub 2024 Jul 2.
10
Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty.在预测翻修关节成形术方面,机器学习的表现并未优于传统的竞争风险模型。
Clin Orthop Relat Res. 2024 Aug 1;482(8):1472-1482. doi: 10.1097/CORR.0000000000003018. Epub 2024 Mar 12.

本文引用的文献

1
NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks.NeuroSketch:使用神经网络对范围聚合查询进行快速近似评估。
Proc ACM Manag Data. 2023 May;1(1). doi: 10.1145/3588954. Epub 2023 May 30.
2
On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing.关于学习索引的分布依赖子对数查询时间
Proc Mach Learn Res. 2023 Jul;202:40669-40680.
3
Optimal approximation of piecewise smooth functions using deep ReLU neural networks.使用深度 ReLU 神经网络对分段光滑函数进行最优逼近。
Neural Netw. 2018 Dec;108:296-330. doi: 10.1016/j.neunet.2018.08.019. Epub 2018 Sep 7.