Tu Yicheng, Eslami Mehrad, Xu Zichen, Charkhgard Hadi
Dept. of Computer Science, University of South Florida, Tampa, Florida, USA.
Jiaxing Neofelis, Technology Co. Ltd, Jiaxing, Zhejiang, China.
Proc IEEE Int Conf Big Data. 2022 Dec;2022:252-261. doi: 10.1109/bigdata55660.2022.10020338. Epub 2023 Jan 26.
Sharing data and computation among concurrent queries has been an active research topic in database systems. While work in this area developed algorithms and systems that are shown to be effective, there is a lack of logical foundation for query processing and optimization. In this paper, we present PsiDB, a system model for processing a large number of database queries in a batch. The key idea is to generate a single query expression that returns a global relation containing all the data needed for individual queries. For that, we propose the use of a type of relational operators called -operators in combining the individual queries into the global expression. We tackle the algebraic optimization problem in PsiDB by developing equivalence rules to transform concurrent queries with the purpose of revealing query optimization opportunities. Centering around the -operator, our rules not only cover many optimization techniques adopted in existing batch processing systems, but also revealed new optimization opportunities. Experiments conducted on an early prototype of PsiDB show a performance improvement of up to 36X over a mainstream commercial DBMS.
在并发查询之间共享数据和计算一直是数据库系统中一个活跃的研究课题。虽然该领域的工作开发了一些被证明有效的算法和系统,但查询处理和优化缺乏逻辑基础。在本文中,我们提出了PsiDB,这是一种用于批量处理大量数据库查询的系统模型。关键思想是生成一个单一的查询表达式,该表达式返回一个包含各个查询所需所有数据的全局关系。为此,我们建议使用一种称为-运算符的关系运算符类型,将各个查询组合成全局表达式。我们通过开发等价规则来处理PsiDB中的代数优化问题,以转换并发查询,目的是揭示查询优化机会。围绕-运算符,我们的规则不仅涵盖了现有批处理系统中采用的许多优化技术,还揭示了新的优化机会。在PsiDB的早期原型上进行的实验表明,与主流商业数据库管理系统相比,性能提高了多达36倍。