Suppr超能文献

基于一个人的名字,NamSor 在预测其原籍国和种族方面的表现如何?

How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?

机构信息

University Institute for Primary Care (IuMFE), University of Geneva, Geneva, Switzerland.

出版信息

PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.

Abstract

BACKGROUND

We aimed to evaluate NamSor's performance in predicting the country of origin and ethnicity of individuals based on their first/last names.

METHODS

We retrieved the name and country of affiliation of all authors of PubMed publications in 2021, affiliated with universities in the twenty-two countries whose researchers authored ≥1,000 medical publications and whose percentage of migrants was <2.5% (N = 88,699). We estimated with NamSor their most likely "continent of origin" (Asia/Africa/Europe), "country of origin" and "ethnicity". We also examined two other variables that we created: "continent#2" ("Europe" replaced by "Europe/America/Oceania") and "country#2" ("Spain" replaced by "Spain/Hispanic American country" and "Portugal" replaced by "Portugal/Brazil"). Using "country of affiliation" as a proxy for "country of origin", we calculated for these five variables the proportion of misclassifications (= errorCodedWithoutNA) and the proportion of non-classifications (= naCoded). We repeated the analyses with a subsample consisting of all results with inference accuracy ≥50%.

RESULTS

For the full sample and the subsample, errorCodedWithoutNA was 16.0% and 12.6% for "continent", 6.3% and 3.3% for "continent#2", 27.3% and 19.5% for "country", 19.7% and 11.4% for "country#2", and 20.2% and 14.8% for "ethnicity"; naCoded was zero and 18.0% for all variables, except for "ethnicity" (zero and 10.7%).

CONCLUSION

NamSor is accurate in determining the continent of origin, especially when using the modified variable (continent#2) and/or restricting the analysis to names with accuracy ≥50%. The risk of misclassification is higher with country of origin or ethnicity, but decreases, as with continent of origin, when using the modified variable (country#2) and/or the subsample.

摘要

背景

我们旨在评估 NamSor 基于个人的名字预测其原籍国和种族的性能。

方法

我们检索了 2021 年在 PubMed 出版物中发表文章的所有作者的姓名和所属国家,这些作者来自 22 个国家的大学,这些国家的研究人员发表了≥1000 篇医学论文,移民比例<2.5%(N=88699)。我们使用 NamSor 估计他们最可能的“原籍大陆”(亚洲/非洲/欧洲)、“原籍国”和“种族”。我们还检查了另外两个我们创建的变量:“大陆#2”(“欧洲”替换为“欧洲/美洲/大洋洲”)和“国家#2”(“西班牙”替换为“西班牙/拉美国家”和“葡萄牙”替换为“葡萄牙/巴西”)。我们使用“所属国家”作为“原籍国”的代理变量,计算了这五个变量的错误分类比例(=errorCodedWithoutNA)和未分类比例(=naCoded)。我们使用所有推断准确性≥50%的结果的子样本重复了这些分析。

结果

对于整个样本和子样本,错误分类比例(=errorCodedWithoutNA)分别为“大陆”的 16.0%和 12.6%、“大陆#2”的 6.3%和 3.3%、“国家”的 27.3%和 19.5%、“国家#2”的 19.7%和 11.4%、以及“种族”的 20.2%和 14.8%;除了“种族”(零和 10.7%)外,所有变量的未分类比例(=naCoded)均为零和 18.0%。

结论

NamSor 在确定原籍大陆方面是准确的,特别是在使用修改后的变量(大陆#2)和/或将分析限制在准确性≥50%的名称时。原籍国或种族的错误分类风险较高,但随着大陆起源的变化(如使用修改后的变量(国家#2)和/或子样本),风险会降低。

相似文献

10
Where are primary type specimens of new mite species deposited?
Zootaxa. 2017 Dec 8;4363(1):1-54. doi: 10.11646/zootaxa.4363.1.1.

本文引用的文献

1
Publication and citation inequalities faced by African researchers.
Eur J Intern Med. 2022 Dec;106:135-137. doi: 10.1016/j.ejim.2022.08.014. Epub 2022 Aug 17.
4
Are Accuracy Parameters Useful for Improving the Performance of Gender Detection Tools? A Comparative Study with Western and Chinese Names.
J Gen Intern Med. 2022 Nov;37(15):4024-4027. doi: 10.1007/s11606-022-07469-6. Epub 2022 Mar 15.
5
Reform scientific elections to improve gender equality.改革科学选举以促进性别平等。
Nat Hum Behav. 2022 Apr;6(4):478-479. doi: 10.1038/s41562-022-01322-w.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验