Suppr超能文献

使用多线程计算系统进行大规模文档反转

Large Scale Document Inversion using a Multi-threaded Computing System.

作者信息

Jung Sungbo, Chang Dar-Jen, Park Juw Won

机构信息

University of Louisville, Computer Engineering and Computer Science, Louisville, KY 40292, 1-502-852-0467.

University of Louisville, Computer Engineering and Computer Science, Louisville, KY 40292, 1-502-852-0472.

出版信息

ACM SIGAPP Appl Comput Rev. 2017 Jun;17(2):27-35. doi: 10.1145/3131080.3131083. Epub 2017 Aug 3.

Abstract

UNLABELLED

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews.

CCS CONCEPTS

•Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

摘要

未标注

当前的微处理器架构正朝着多核/多线程系统发展。这一趋势引发了人们对使用多线程计算设备(如图形处理单元(GPU))进行通用计算的浓厚兴趣。由于GPU由多个核心组成,我们可以将其作为大规模并行协处理器用于计算。GPU也是一种价格实惠、颇具吸引力且用户可编程的商品。如今,世界各地的大量信息已涌入数字领域。每时每刻都在产生或收集海量数据,如数字图书馆、社交网络服务、电子商务产品数据及评论等,且数据量呈急剧增长态势。尽管倒排索引是一种可用于全文搜索或文档检索的有用数据结构,但大量文档创建索引将需要大量时间。多线程或多核GPU可提高文档倒排的性能。我们的方法是在NVIDIA GPU/CUDA编程平台上利用GPU的巨大计算能力实现一种线性时间、基于哈希的单程序多数据(SPMD)文档倒排算法,以开发用于文档索引的高性能解决方案。我们提出的并行文档倒排系统在来自PubMed摘要和电子商务产品评论的两个不同测试数据集上的性能比顺序系统快2至3倍。

CCS概念:•信息系统➝信息检索 •计算方法➝大规模并行和高性能模拟。

相似文献

1
Large Scale Document Inversion using a Multi-threaded Computing System.使用多线程计算系统进行大规模文档反转
ACM SIGAPP Appl Comput Rev. 2017 Jun;17(2):27-35. doi: 10.1145/3131080.3131083. Epub 2017 Aug 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验