基于一个人的名字，NamSor 在预测其原籍国和种族方面的表现如何？

How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?

机构信息

University Institute for Primary Care (IuMFE), University of Geneva, Geneva, Switzerland.

出版信息

PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.

DOI:10.1371/journal.pone.0294562

PMID:37972002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10653483/

Abstract

BACKGROUND

We aimed to evaluate NamSor's performance in predicting the country of origin and ethnicity of individuals based on their first/last names.

METHODS

We retrieved the name and country of affiliation of all authors of PubMed publications in 2021, affiliated with universities in the twenty-two countries whose researchers authored ≥1,000 medical publications and whose percentage of migrants was <2.5% (N = 88,699). We estimated with NamSor their most likely "continent of origin" (Asia/Africa/Europe), "country of origin" and "ethnicity". We also examined two other variables that we created: "continent#2" ("Europe" replaced by "Europe/America/Oceania") and "country#2" ("Spain" replaced by "Spain/Hispanic American country" and "Portugal" replaced by "Portugal/Brazil"). Using "country of affiliation" as a proxy for "country of origin", we calculated for these five variables the proportion of misclassifications (= errorCodedWithoutNA) and the proportion of non-classifications (= naCoded). We repeated the analyses with a subsample consisting of all results with inference accuracy ≥50%.

RESULTS

For the full sample and the subsample, errorCodedWithoutNA was 16.0% and 12.6% for "continent", 6.3% and 3.3% for "continent#2", 27.3% and 19.5% for "country", 19.7% and 11.4% for "country#2", and 20.2% and 14.8% for "ethnicity"; naCoded was zero and 18.0% for all variables, except for "ethnicity" (zero and 10.7%).

CONCLUSION

NamSor is accurate in determining the continent of origin, especially when using the modified variable (continent#2) and/or restricting the analysis to names with accuracy ≥50%. The risk of misclassification is higher with country of origin or ethnicity, but decreases, as with continent of origin, when using the modified variable (country#2) and/or the subsample.

摘要

背景

我们旨在评估 NamSor 基于个人的名字预测其原籍国和种族的性能。

方法

我们检索了 2021 年在 PubMed 出版物中发表文章的所有作者的姓名和所属国家，这些作者来自 22 个国家的大学，这些国家的研究人员发表了≥1000 篇医学论文，移民比例<2.5%（N=88699）。我们使用 NamSor 估计他们最可能的“原籍大陆”（亚洲/非洲/欧洲）、“原籍国”和“种族”。我们还检查了另外两个我们创建的变量：“大陆#2”（“欧洲”替换为“欧洲/美洲/大洋洲”）和“国家#2”（“西班牙”替换为“西班牙/拉美国家”和“葡萄牙”替换为“葡萄牙/巴西”）。我们使用“所属国家”作为“原籍国”的代理变量，计算了这五个变量的错误分类比例（=errorCodedWithoutNA）和未分类比例（=naCoded）。我们使用所有推断准确性≥50%的结果的子样本重复了这些分析。

结果

对于整个样本和子样本，错误分类比例（=errorCodedWithoutNA）分别为“大陆”的 16.0%和 12.6%、“大陆#2”的 6.3%和 3.3%、“国家”的 27.3%和 19.5%、“国家#2”的 19.7%和 11.4%、以及“种族”的 20.2%和 14.8%；除了“种族”（零和 10.7%）外，所有变量的未分类比例（=naCoded）均为零和 18.0%。

结论

NamSor 在确定原籍大陆方面是准确的，特别是在使用修改后的变量（大陆#2）和/或将分析限制在准确性≥50%的名称时。原籍国或种族的错误分类风险较高，但随着大陆起源的变化（如使用修改后的变量（国家#2）和/或子样本），风险会降低。

相似文献

How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?基于一个人的名字，NamSor 在预测其原籍国和种族方面的表现如何？

PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.

Performance of gender detection tools: a comparative study of name-to-gender inference services.性别检测工具的性能：姓名到性别推断服务的比较研究。

J Med Libr Assoc. 2021 Jul 1;109(3):414-421. doi: 10.5195/jmla.2021.1185.

The level of the gender gap in academic publishing varies by country and region of affiliation: A cross-sectional study of articles published in general medical journals.学术出版领域的性别差距程度因国家和所属地区而异：对普通医学期刊发表的文章进行的横断面研究。

PLoS One. 2023 Sep 21;18(9):e0291837. doi: 10.1371/journal.pone.0291837. eCollection 2023.

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference.使用 genderize.io 推断名字的性别：如何提高推断的准确性。

J Med Libr Assoc. 2021 Oct 1;109(4):609-612. doi: 10.5195/jmla.2021.1252.

How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format.性别检测工具在预测中文名字的性别方面有多准确？一项针对 20000 个拼音形式的名字的研究。

J Med Libr Assoc. 2022 Apr 1;110(2):205-211. doi: 10.5195/jmla.2022.1289.

News, views, trends: a world-wide survey of recent developments, fresh ideas and production plans.新闻、观点、趋势：对近期发展、新想法及生产计划的全球调查。

World Tob. 1976 Oct(54):43-54.

The definition of immigrant status matters: impact of nationality, country of origin, and length of stay in host country on mortality estimates.移民身份的定义很重要：国籍、原籍国和在东道国停留时间对死亡率估计的影响。

BMC Public Health. 2019 Feb 28;19(1):247. doi: 10.1186/s12889-019-6555-1.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Classifying ethnicity utilizing the Canadian Mortality Data Base.利用加拿大死亡率数据库对种族进行分类。

Ethn Health. 1997 Nov;2(4):287-95. doi: 10.1080/13557858.1997.9961837.

Where are primary type specimens of new mite species deposited?

Zootaxa. 2017 Dec 8;4363(1):1-54. doi: 10.11646/zootaxa.4363.1.1.

引用本文的文献

Can ChatGPT Recognize Its Own Writing in Scientific Abstracts?ChatGPT能在科学摘要中识别出自己的写作内容吗？

Cureus. 2025 Jul 25;17(7):e88774. doi: 10.7759/cureus.88774. eCollection 2025 Jul.

Social Vulnerability Index as a Tool to Evaluate the Distribution of Head and Neck Oncology Surgeons.社会脆弱性指数作为评估头颈肿瘤外科医生分布的工具

Laryngoscope. 2025 Sep;135(9):3178-3185. doi: 10.1002/lary.32136. Epub 2025 Mar 26.

Geographical Disparities in Research Misconduct: Analyzing Retraction Patterns by Country.研究不端行为中的地域差异：按国家分析撤稿模式。

J Med Internet Res. 2025 Jan 14;27:e65775. doi: 10.2196/65775.

Study on the Analysis of Gender Trends Among the First Authors of Publications on Budd-Chiari Syndrome.布加综合征出版物第一作者性别趋势分析研究

Cureus. 2024 Jun 29;16(6):e63458. doi: 10.7759/cureus.63458. eCollection 2024 Jun.

The role of race and ethnicity in health care crowdfunding: an exploratory analysis.种族和族裔在医疗众筹中的作用：一项探索性分析。

Health Aff Sch. 2024 Feb 28;2(3):qxae027. doi: 10.1093/haschl/qxae027. eCollection 2024 Mar.

本文引用的文献

Publication and citation inequalities faced by African researchers.

Eur J Intern Med. 2022 Dec;106:135-137. doi: 10.1016/j.ejim.2022.08.014. Epub 2022 Aug 17.

Gender Inequalities in Citations of Articles Published in High-Impact General Medical Journals: a Cross-Sectional Study.高影响力普通医学期刊发表文章的引文存在性别不平等：一项横断面研究。

J Gen Intern Med. 2023 Feb;38(3):661-666. doi: 10.1007/s11606-022-07717-9. Epub 2022 Jul 6.

J Med Libr Assoc. 2022 Apr 1;110(2):205-211. doi: 10.5195/jmla.2022.1289.

Are Accuracy Parameters Useful for Improving the Performance of Gender Detection Tools? A Comparative Study with Western and Chinese Names.

J Gen Intern Med. 2022 Nov;37(15):4024-4027. doi: 10.1007/s11606-022-07469-6. Epub 2022 Mar 15.

Reform scientific elections to improve gender equality.改革科学选举以促进性别平等。

Nat Hum Behav. 2022 Apr;6(4):478-479. doi: 10.1038/s41562-022-01322-w.

Strengthening research capacity: a systematic review of manuscript writing and publishing interventions for researchers in low-income and middle-income countries.加强研究能力：对在低收入和中等收入国家的研究人员进行的手稿撰写和发表干预措施的系统评价。

BMJ Glob Health. 2022 Feb;7(2). doi: 10.1136/bmjgh-2021-008059.

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference.使用 genderize.io 推断名字的性别：如何提高推断的准确性。

J Med Libr Assoc. 2021 Oct 1;109(4):609-612. doi: 10.5195/jmla.2021.1252.

Performance of gender detection tools: a comparative study of name-to-gender inference services.性别检测工具的性能：姓名到性别推断服务的比较研究。

J Med Libr Assoc. 2021 Jul 1;109(3):414-421. doi: 10.5195/jmla.2021.1185.

Gender gap in authorship: a study of 44,000 articles published in 100 high-impact general medical journals.作者署名中的性别差异：对发表在100种高影响力综合医学期刊上的44000篇文章的研究。

Eur J Intern Med. 2022 Mar;97:103-105. doi: 10.1016/j.ejim.2021.09.013. Epub 2021 Sep 28.

Publication Rates, Ethnic and Sex Disparities in UK and Ireland Surgical Research Prize Presentations: An Analysis of Data From the Moynihan and Patey Prizes From 2000 to 2020.英国和爱尔兰外科研究奖颁奖中的发表率、种族和性别差异：对2000年至2020年莫伊尼汉奖和佩蒂奖数据的分析

World J Surg. 2021 Nov;45(11):3266-3277. doi: 10.1007/s00268-021-06268-0. Epub 2021 Aug 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验