• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
(Almost) all of entity resolution.(几乎)所有的实体解析。
Sci Adv. 2022 Mar 25;8(12):eabi8021. doi: 10.1126/sciadv.abi8021.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
4
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.
5
Building the National Database of Health Centred on the Individual: Administrative and Epidemiological Record Linkage - Brazil, 2000-2015.构建以个人为中心的国家健康数据库:行政与流行病学记录关联——巴西,2000 - 2015年
Int J Popul Data Sci. 2018 Nov 14;3(1):446. doi: 10.23889/ijpds.v3i1.446.
6
Network-based statistical comparison of citation topology of bibliographic databases.基于网络的书目数据库引用拓扑结构的统计比较。
Sci Rep. 2014 Sep 29;4:6496. doi: 10.1038/srep06496.
7
Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇:跨领域的系统评价与生化荟萃分析
Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.
8
A Scientometric Review of Soft Robotics: Intellectual Structures and Emerging Trends Analysis (2010-2021).软机器人技术的科学计量学综述:知识结构与新兴趋势分析(2010 - 2021年)
Front Robot AI. 2022 May 5;9:868682. doi: 10.3389/frobt.2022.868682. eCollection 2022.
9
Growth of Global Publishing Output of Health Economics in the Twenty-First Century: A Bibliographic Insight.21世纪全球卫生经济学出版产出的增长:文献综述洞察
Front Public Health. 2017 Aug 11;5:211. doi: 10.3389/fpubh.2017.00211. eCollection 2017.
10
Information Retrieval in Food Science Research: A Bibliographic Database Analysis.食品科学研究中的信息检索:文献数据库分析。
J Food Sci. 2018 Dec;83(12):2912-2922. doi: 10.1111/1750-3841.14388. Epub 2018 Nov 19.

引用本文的文献

1
Simulated data for census-scale entity resolution research without privacy restrictions: a large-scale dataset generated by individual-based modeling.无隐私限制的普查级实体解析研究的模拟数据:基于个体建模生成的大规模数据集。
Gates Open Res. 2024 Oct 18;8:36. doi: 10.12688/gatesopenres.15418.2. eCollection 2024.
2
Multifile Partitioning for Record Linkage and Duplicate Detection.用于记录链接和重复检测的多文件分区
J Am Stat Assoc. 2023;118(543):1786-1795. doi: 10.1080/01621459.2021.2013242. Epub 2022 Jan 28.
3
Thirty-three myths and misconceptions about population data: from data capture and processing to linkage.关于人口数据的 33 个神话和误解:从数据采集和处理到关联。
Int J Popul Data Sci. 2023 Jan 31;8(1):2115. doi: 10.23889/ijpds.v8i1.2115. eCollection 2023.

本文引用的文献

1
How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data.自动链接方法的表现如何?来自美国历史数据的经验教训。
J Econ Lit. 2020 Dec;58(4):997-1044. doi: 10.1257/jel.20191526.
2
An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries.概率记录链接简介,重点介绍世界贸易中心注册处的链接处理。
Int J Environ Res Public Health. 2020 Sep 22;17(18):6937. doi: 10.3390/ijerph17186937.
3
Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study.比较公共卫生行动记录链接的方法:匹配算法验证研究。
JMIR Public Health Surveill. 2020 Apr 30;6(2):e15917. doi: 10.2196/15917.
4
Theoretical limits of microclustering for record linkage.记录链接微聚类的理论极限
Biometrika. 2018 Jun;105(2):431-446. doi: 10.1093/biomet/asy003. Epub 2018 Mar 19.
5
Data sets for author name disambiguation: an empirical analysis and a new resource.用于消除作者姓名歧义的数据集:实证分析与新资源。
Scientometrics. 2017;111(3):1467-1500. doi: 10.1007/s11192-017-2363-5. Epub 2017 Mar 27.
6
A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs.一种用于文件链接以分析临终医疗费用的贝叶斯程序。
J Am Stat Assoc. 2013 Jan 1;108(501):34-47. doi: 10.1080/01621459.2012.726889.
7
The study of mutation and selection in human populations.人类群体中的突变与选择研究。
Eugen Rev. 1965 Sep;57(3):109-25.
8
Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage.模拟数据集的结果:概率记录链接优于确定性记录链接。
J Clin Epidemiol. 2011 May;64(5):565-72. doi: 10.1016/j.jclinepi.2010.05.008. Epub 2010 Oct 16.
9
Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a 'basic' deterministic algorithm.公共领域的记录链接软件:Link Plus、The Link King与一种“基本”确定性算法的比较
Health Informatics J. 2008 Mar;14(1):5-15. doi: 10.1177/1460458208088855.
10
Record Linkage.记录链接
Am J Public Health Nations Health. 1946 Dec;36(12):1412-6.

(几乎)所有的实体解析。

(Almost) all of entity resolution.

作者信息

Binette Olivier, Steorts Rebecca C

机构信息

Department of Statistical Science, Duke University, Durham, NC, USA.

Department of Statistical Science, Computer Science, Biostatistics and Bioinformatics, the Rhodes Information Initiative at Duke (iiD) and the Social Science Research Institute (SSRI), Duke University, Durham, NC, USA.

出版信息

Sci Adv. 2022 Mar 25;8(12):eabi8021. doi: 10.1126/sciadv.abi8021.

DOI:10.1126/sciadv.abi8021
PMID:35333582
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11636688/
Abstract

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme-integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as structured entity resolution (record linkage or deduplication). Here, we review motivational applications and seminal papers that have led to the growth of this area. We review modern probabilistic and Bayesian methods in statistics, computer science, machine learning, database management, economics, political science, and other disciplines that are used throughout industry and academia in applications such as human rights, official statistics, medicine, and citation networks, among others. Last, we discuss current research topics of practical importance.

摘要

无论是要估算国会选区的人口数量,还是要估算在武装冲突中死亡的人数,亦或是利用书目数据来消除作者身份的歧义,所有这些应用都有一个共同的主题——整合来自多个来源的信息。在回答此类问题之前,必须以系统且准确的方式清理和整合数据库,这通常被称为结构化实体解析(记录链接或去重)。在此,我们回顾了促使该领域发展的激励性应用和开创性论文。我们还回顾了统计学、计算机科学、机器学习、数据库管理、经济学、政治学以及其他学科中的现代概率和贝叶斯方法,这些方法在整个人权、官方统计、医学和引文网络等行业和学术界的应用中都有使用。最后,我们讨论了当前具有实际重要性的研究课题。