Suppr超能文献

投标语料库:一个用于公共采购的多方面学习数据集。

BidCorpus: A multifaceted learning dataset for public procurement.

作者信息

Lima Weslley, Silva Victor, Silva Jasson, Lira Ricardo, Paiva Anselmo

机构信息

Federal University of Piauí. Campus Universitário Ministro Petrônio Portella. Teresina, Piauí, Brazil.

Federal University of Maranhão. Av. dos Portugueses, 1966 São Luís, Maranhão, Brazil.

出版信息

Data Brief. 2024 Dec 9;58:111202. doi: 10.1016/j.dib.2024.111202. eCollection 2025 Feb.

Abstract

Digital transformation has significantly impacted public procurement, improving operational efficiency, transparency, and competition. This transformation has allowed the automation of data analysis and oversight in public administration. Public procurement involves various stages and generates a multitude of documents. However, experts manually analyze these unstructured textual documents, which are time-consuming and inefficient. To address this issue, we introduce BidCorpus, a novel and comprehensive dataset consisting of thousands of documents related to public procurement, specifically bidding notices from Brazilian public websites. The dataset was labeled using weak supervision techniques, manual labeling, and BERT-based language models. Models trained with these annotated data showed promising results, with metrics greater than 80 % in various experiments. The models could also tolerate intentional changes made to bidding notices to evade fraud detection. All the resources from this work are publicly available, including the documents, pre-processing scripts, and training and evaluation of the models. We expect the dataset and its labels to be of great value to researchers working on public procurement problems.

摘要

数字转型对公共采购产生了重大影响,提高了运营效率、透明度和竞争力。这种转型使得公共行政中的数据分析和监督实现了自动化。公共采购涉及多个阶段,并产生大量文件。然而,专家们需要手动分析这些非结构化文本文件,这既耗时又低效。为了解决这个问题,我们引入了BidCorpus,这是一个新颖且全面的数据集,由数千份与公共采购相关的文件组成,特别是来自巴西公共网站的招标公告。该数据集使用弱监督技术、人工标注和基于BERT的语言模型进行标注。用这些标注数据训练的模型显示出了有前景的结果,在各种实验中指标超过80%。这些模型还能够容忍对招标公告进行的故意更改,以逃避欺诈检测。这项工作的所有资源都是公开可用的,包括文件、预处理脚本以及模型的训练和评估。我们期望该数据集及其标签对研究公共采购问题的研究人员具有巨大价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7247/11715116/16fedceb4e2e/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验