Suppr超能文献

(几乎)所有的实体解析。

(Almost) all of entity resolution.

作者信息

Binette Olivier, Steorts Rebecca C

机构信息

Department of Statistical Science, Duke University, Durham, NC, USA.

Department of Statistical Science, Computer Science, Biostatistics and Bioinformatics, the Rhodes Information Initiative at Duke (iiD) and the Social Science Research Institute (SSRI), Duke University, Durham, NC, USA.

出版信息

Sci Adv. 2022 Mar 25;8(12):eabi8021. doi: 10.1126/sciadv.abi8021.

Abstract

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme-integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as structured entity resolution (record linkage or deduplication). Here, we review motivational applications and seminal papers that have led to the growth of this area. We review modern probabilistic and Bayesian methods in statistics, computer science, machine learning, database management, economics, political science, and other disciplines that are used throughout industry and academia in applications such as human rights, official statistics, medicine, and citation networks, among others. Last, we discuss current research topics of practical importance.

摘要

无论是要估算国会选区的人口数量,还是要估算在武装冲突中死亡的人数,亦或是利用书目数据来消除作者身份的歧义,所有这些应用都有一个共同的主题——整合来自多个来源的信息。在回答此类问题之前,必须以系统且准确的方式清理和整合数据库,这通常被称为结构化实体解析(记录链接或去重)。在此,我们回顾了促使该领域发展的激励性应用和开创性论文。我们还回顾了统计学、计算机科学、机器学习、数据库管理、经济学、政治学以及其他学科中的现代概率和贝叶斯方法,这些方法在整个人权、官方统计、医学和引文网络等行业和学术界的应用中都有使用。最后,我们讨论了当前具有实际重要性的研究课题。

相似文献

1
(Almost) all of entity resolution.(几乎)所有的实体解析。
Sci Adv. 2022 Mar 25;8(12):eabi8021. doi: 10.1126/sciadv.abi8021.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

本文引用的文献

4
Theoretical limits of microclustering for record linkage.记录链接微聚类的理论极限
Biometrika. 2018 Jun;105(2):431-446. doi: 10.1093/biomet/asy003. Epub 2018 Mar 19.
10
Record Linkage.记录链接
Am J Public Health Nations Health. 1946 Dec;36(12):1412-6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验