Suppr超能文献

具有家庭级信息的记录的开源概率记录链接过程:模拟研究和应用分析。

An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis.

机构信息

Suzanne Dworak-Peck School of Social Work, University of Southern California, Los Angeles, Los Angeles, California, United States America.

Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, California, United States America.

出版信息

PLoS One. 2023 Oct 20;18(10):e0291581. doi: 10.1371/journal.pone.0291581. eCollection 2023.

Abstract

Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.

摘要

研究行政记录涉及到在任何单一数据源中有限信息的挑战,以回答与政策相关的问题。记录链接为研究人员提供了一种工具,当在单独的来源中识别出匹配对时,可以用其他人的其他信息来补充行政数据集。有几种解决方案可用于进行记录链接,为合并数据源生成链接键,以匹配记录的正匹配对。在当前的手稿中,我们展示了 Python RecordLinkage 包的一个新应用,用于基于家庭的记录链接,并使用机器学习算法进行概率评分,我们称之为家庭概率记录链接(PRLF)。首先,通过模拟行政记录来识别 PRLF 的准确性,其中包括匹配和数据降级百分比的变化。与模拟匹配的百分比相比,准确性主要受降级(例如,缺失数据字段、不匹配的值)的影响。其次,提出了一种数据链接应用,比较了三种记录链接解决方案(PRLF、ChoiceMaker 和 Link Plus)的回归模型估计性能。我们的发现表明,当优化时,所有三种解决方案都为研究人员提供了相似的结果。讨论了我们过程的优势,例如使用集成方法来提高匹配准确性。然后,我们在行政数据的上下文中确定了记录链接的注意事项。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4df/10588881/73411a621808/pone.0291581.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验