• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

[概率性数据链接中自动匹配分类方法的评估]

[Assessment of a method for automatic match classification in probabilistic data linkage].

作者信息

Duarte Daniela de Almeida Pereira, Corrêa Camila Soares Lima, Fayer Vívian Assis, Nogueira Mário Círio, Bustamante-Teixeira Maria Teresa

机构信息

Universidade Federal de Juiz de Fora, Juiz de Fora, Brasil.

Divisão de Saúde, Universidade Federal de Viçosa, Viçosa, Brasil.

出版信息

Cad Saude Publica. 2019 Nov 11;35(11):e00066419. doi: 10.1590/0102-311X00066419. eCollection 2019.

DOI:10.1590/0102-311X00066419
PMID:31721900
Abstract

The objective was to test and assess the accuracy of a scoring method in probabilistic data linkage in order to enable automatic identification of true matches, dispensing with the manual inspection stage. Accuracy study using data from the Breast Cancer Information System (SISMAMA) base in Minas Gerais State, Brazil, from 2009 and 2010. After cleaning and standardization, a 16-step probabilistic linkage of the 2009 and 2010 databases was performed, where each step was inspected manually to obtain a gold standard. Samples were then selected, inspected, and assessed to calculate the method's accuracy in selecting true matches. All the steps and samples with 200 and 300 matches showed high sensitivity (recall) > 0.97, high positive predictive value (precision) > 0.95, high accuracy (> 0.97) and F measure (> 0.96), and high area under the curve precision-recall (> 0.98). The sample with 100 matches showed high values for these measures, but with low scores. Of the 16 steps assessed, the combined use of only three was sufficient to identify 99.24% of the true matches in the total database. The proposed method allows automatically linking databases, maintaining the method's accuracy. It facilitates the use of probabilistic linkage in health services, especially for health surveillance and management.

摘要

目的是测试和评估概率数据链接中一种评分方法的准确性,以便能够自动识别真正匹配项,省去人工检查阶段。使用来自巴西米纳斯吉拉斯州乳腺癌信息系统(SISMAMA)数据库2009年和2010年的数据进行准确性研究。在清理和标准化之后,对2009年和2010年的数据库进行了16步概率链接,其中每一步都进行人工检查以获得金标准。然后选择、检查和评估样本,以计算该方法在选择真正匹配项方面的准确性。所有有200和300个匹配项的步骤和样本均显示出高灵敏度(召回率)>0.97、高阳性预测值(精确率)>0.95、高准确率(>0.97)和F值(>0.96),以及高精度召回率曲线下面积(>0.98)。有100个匹配项的样本这些指标值较高,但分数较低。在评估的16个步骤中,仅结合使用其中三个步骤就足以识别总数据库中99.24%的真正匹配项。所提出的方法允许自动链接数据库,保持该方法的准确性。它便于在卫生服务中使用概率链接,特别是用于健康监测和管理。

相似文献

1
[Assessment of a method for automatic match classification in probabilistic data linkage].[概率性数据链接中自动匹配分类方法的评估]
Cad Saude Publica. 2019 Nov 11;35(11):e00066419. doi: 10.1590/0102-311X00066419. eCollection 2019.
2
On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort.基于巴西 1.14 亿队列的概率数据链接的准确性和可扩展性研究
IEEE J Biomed Health Inform. 2018 Mar;22(2):346-353. doi: 10.1109/JBHI.2018.2796941.
3
A comparison of accuracy and computational feasibility of two record linkage algorithms in retrieving vital status information from HIV/AIDS patients registered in Brazilian public databases.两种记录链接算法在从巴西公共数据库中检索艾滋病毒/艾滋病患者生命状态信息方面的准确性和计算可行性比较。
Int J Med Inform. 2018 Jun;114:45-51. doi: 10.1016/j.ijmedinf.2018.03.005. Epub 2018 Mar 20.
4
Comparative validity of methods to select appropriate cutoff weight for probabilistic linkage without unique personal identifiers.在没有唯一个人标识符的情况下,为概率性关联选择合适截断权重的方法的比较有效性。
Pharmacoepidemiol Drug Saf. 2016 Apr;25(4):444-52. doi: 10.1002/pds.3832. Epub 2015 Jul 14.
5
[Inclusion of a deterministic post-processing stage to increase the performance of probabilistic record linkage].[纳入确定性后处理阶段以提高概率性记录链接的性能]
Cad Saude Publica. 2018 Jun 21;34(6):e00088117. doi: 10.1590/0102-311X00088117.
6
Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania.坦桑尼亚农村地区高连锁错误率数据的分析中,连锁质量对推论的影响。
BMC Med Res Methodol. 2018 Dec 10;18(1):165. doi: 10.1186/s12874-018-0632-5.
7
Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil.在条件不理想的情况下进行记录链接,以对巴西里约热内卢的初级保健进行数据密集型评估。
BMC Med Inform Decis Mak. 2021 Jun 15;21(1):190. doi: 10.1186/s12911-021-01550-6.
8
Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil.中等收入国家两个大型行政数据库的记录关联评估:巴西的死产与孕期登革热通报情况
BMC Med Inform Decis Mak. 2017 Jul 17;17(1):108. doi: 10.1186/s12911-017-0506-5.
9
Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies.使用公共卫生与流行病学研究增强匹配系统的概率性链接的准确性
PLoS One. 2015 Aug 24;10(8):e0136179. doi: 10.1371/journal.pone.0136179. eCollection 2015.
10
Interobserver reliability in the classification of pairs of records formed by probabilistic linkage of SISMAMA databases.通过SISMAMA数据库概率性链接形成的记录对分类中的观察者间可靠性。
Rev Bras Epidemiol. 2019 Sep 2;22:e190045. doi: 10.1590/1980-549720190045.

引用本文的文献

1
Inequality by Skin Color in Breast Cancer Screening in Brazil: a Differences-in-Differences Analysis of the COVID-19 Pandemic.巴西乳腺癌筛查中肤色不平等现象:新冠疫情的双重差分分析
J Racial Ethn Health Disparities. 2025 Apr;12(2):685-691. doi: 10.1007/s40615-024-01908-2. Epub 2024 Jan 16.