• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 genderize.io 推断名字的性别:如何提高推断的准确性。

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference.

出版信息

J Med Libr Assoc. 2021 Oct 1;109(4):609-612. doi: 10.5195/jmla.2021.1252.

DOI:10.5195/jmla.2021.1252
PMID:34858090
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8608220/
Abstract

OBJECTIVE

We recently showed that genderize.io is not a sufficiently powerful gender detection tool due to a large number of nonclassifications. In the present study, we aimed to assess whether the accuracy of inference by genderize.io can be improved by manipulating the first names in the database.

METHODS

We used a database containing the first names, surnames, and gender of 6,131 physicians practicing in a multicultural country (Switzerland). We uploaded the original CSV file (file #1), the file obtained after removing all diacritic marks, such as accents and cedilla (file #2), and the file obtained after removing all diacritic marks and retaining only the first term of the compound first names (file #3). For each file, we computed three performance metrics: proportion of misclassifications (errorCodedWithoutNA), proportion of nonclassifications (naCoded), and proportion of misclassifications and nonclassifications (errorCoded).

RESULTS

naCoded, which was high for file #1 (16.4%), was reduced after data manipulation (file #2: 11.7%, file #3: 0.4%). As the increase in the number of misclassifications was small, the overall performance of genderize.io (i.e., errorCoded) improved, especially for file #3 (file #1: 17.7%, file #2: 13.0%, and file #3: 2.3%).

CONCLUSIONS

A relatively simple manipulation of the data improved the accuracy of gender inference by genderize.io. We recommend using genderize.io only with files that were modified in this way.

摘要

目的

我们最近发现,由于大量未分类的情况,genderize.io 并不是一个足够强大的性别检测工具。在本研究中,我们旨在评估通过 genderize.io 进行推断的准确性是否可以通过操纵数据库中的名字来提高。

方法

我们使用了一个包含在一个多元文化国家(瑞士)行医的 6131 名医生的名字、姓氏和性别的数据库。我们上传了原始的 CSV 文件(文件 #1)、删除了所有变音符号(如重音符号和小舌音)后的文件(文件 #2)以及删除了所有变音符号并保留了复合名字的第一个词后的文件(文件 #3)。对于每个文件,我们计算了三个性能指标:错误分类的比例(无缺失值的错误分类码)、未分类的比例(缺失值的分类码)以及错误分类和未分类的比例(错误分类码)。

结果

文件 #1 的未分类比例较高(16.4%),经过数据处理后(文件 #2:11.7%,文件 #3:0.4%)有所降低。由于错误分类的数量增加很小,genderize.io 的整体性能(即错误分类码)得到了提高,特别是对于文件 #3(文件 #1:17.7%,文件 #2:13.0%,文件 #3:2.3%)。

结论

对数据进行相对简单的操作可以提高 genderize.io 性别推断的准确性。我们建议仅在使用以这种方式修改过的文件时使用 genderize.io。

相似文献

1
Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference.使用 genderize.io 推断名字的性别:如何提高推断的准确性。
J Med Libr Assoc. 2021 Oct 1;109(4):609-612. doi: 10.5195/jmla.2021.1252.
2
Performance of gender detection tools: a comparative study of name-to-gender inference services.性别检测工具的性能:姓名到性别推断服务的比较研究。
J Med Libr Assoc. 2021 Jul 1;109(3):414-421. doi: 10.5195/jmla.2021.1185.
3
How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format.性别检测工具在预测中文名字的性别方面有多准确?一项针对 20000 个拼音形式的名字的研究。
J Med Libr Assoc. 2022 Apr 1;110(2):205-211. doi: 10.5195/jmla.2022.1289.
4
Erratum to "Performance of gender detection tools: a comparative study of name-to-gender inference services," 2021;109(3):414-21 and "Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference," 2021;109(4):609-12.《性别检测工具的性能:姓名到性别的推理服务的比较研究》(2021年;109(3):414 - 21)及《使用genderize.io推断名字的性别:如何提高推断的准确性》(2021年;109(4):609 - 12)的勘误
J Med Libr Assoc. 2022 Apr 1;110(2):E32. doi: 10.5195/jmla.2022.1528.
5
How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?基于一个人的名字,NamSor 在预测其原籍国和种族方面的表现如何?
PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.
6
Gender and Nationality Trends in Manuscripts Published in Prominent Gastroenterology Journals Between 1997 and 2017.1997 年至 2017 年期间,知名胃肠病学期刊发表的手稿中的性别和国籍趋势。
Dig Dis Sci. 2022 Feb;67(2):367-376. doi: 10.1007/s10620-021-07021-2. Epub 2021 May 18.
7
Difficult name, cold man: Chinese names, gender stereotypicality and trustworthiness.难名、冷男:中国人的名字、性别刻板印象与可信度。
Int J Psychol. 2021 Jun;56(3):349-360. doi: 10.1002/ijop.12727. Epub 2020 Dec 7.
8
What Is the Performance of ChatGPT in Determining the Gender of Individuals Based on Their First and Last Names?ChatGPT根据名字确定个人性别的表现如何?
JMIR AI. 2024 Mar 13;3:e53656. doi: 10.2196/53656.
9
Construction and validation of a list of common Middle Eastern surnames for epidemiological research.用于流行病学研究的中东常见姓氏列表的构建与验证。
Cancer Detect Prev. 2007;31(5):424-9. doi: 10.1016/j.cdp.2007.10.006. Epub 2007 Nov 26.
10
Analysis of authorship trends in vascular surgery demonstrates a sticky surgical floor for women.血管外科领域作者身份趋势分析表明,女性在该外科领域的晋升存在障碍。
J Vasc Surg. 2022 Jan;75(1):20-28. doi: 10.1016/j.jvs.2021.07.228. Epub 2021 Aug 25.

引用本文的文献

1
Representation of women on National Institutes of Health study sections before and during COVID-19 pandemic.美国国立卫生研究院研究小组在新冠疫情之前及期间女性的代表性情况。
J Clin Transl Sci. 2025 Jul 7;9(1):e152. doi: 10.1017/cts.2025.10091. eCollection 2025.
2
Gender differences in collaboration and career progression in physics.物理学领域合作与职业发展中的性别差异。
R Soc Open Sci. 2025 Aug 6;12(8):241536. doi: 10.1098/rsos.241536. eCollection 2025 Aug.
3
Sex and gender considerations in randomized controlled trials in critical care nephrology: a meta-epidemiologic study.危重症肾脏病学随机对照试验中的性别与性别的考量:一项元流行病学研究
BMC Med. 2025 Jul 1;23(1):386. doi: 10.1186/s12916-025-04202-y.
4
Gender and Authorship in Annals of Surgery: A nineteen-year review including the pandemic.《外科年鉴》中的性别与作者身份:一项涵盖疫情的19年回顾
Ann Surg Open. 2024 Sep 26;5(4):e491. doi: 10.1097/AS9.0000000000000491. eCollection 2024 Dec.
5
Comparative analysis of automatic gender detection from names: evaluating the stability and performance of ChatGPT Namsor, and Gender-API.从名字进行自动性别检测的比较分析:评估ChatGPT、Namsor和Gender-API的稳定性和性能。
PeerJ Comput Sci. 2024 Oct 17;10:e2378. doi: 10.7717/peerj-cs.2378. eCollection 2024.
6
Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality.从名字推断性别:比较Genderize、性别API和性别R包对不同国籍作者的识别准确率。
PLOS Digit Health. 2024 Oct 29;3(10):e0000456. doi: 10.1371/journal.pdig.0000456. eCollection 2024 Oct.
7
Analysis of science journalism reveals gender and regional disparities in coverage.科学新闻分析揭示了报道中的性别和地区差异。
Elife. 2024 May 28;12:RP84855. doi: 10.7554/eLife.84855.
8
Systematic Review of Women Leading and Participating in Nephrology Randomized Clinical Trials.女性主导和参与肾脏病学随机临床试验的系统评价。
Kidney Int Rep. 2024 Jan 28;9(4):898-906. doi: 10.1016/j.ekir.2024.01.031. eCollection 2024 Apr.
9
How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?基于一个人的名字,NamSor 在预测其原籍国和种族方面的表现如何?
PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.
10
Name-based demographic inference and the unequal distribution of misrecognition.基于姓名的人口统计推断与错误识别的不平等分布。
Nat Hum Behav. 2023 Jul;7(7):1084-1095. doi: 10.1038/s41562-023-01587-9. Epub 2023 Apr 17.

本文引用的文献

1
Performance of gender detection tools: a comparative study of name-to-gender inference services.性别检测工具的性能:姓名到性别推断服务的比较研究。
J Med Libr Assoc. 2021 Jul 1;109(3):414-421. doi: 10.5195/jmla.2021.1185.
2
Are female authors under-represented in primary healthcare and general internal medicine journals?在初级卫生保健和普通内科医学期刊中,女性作者的占比是否过低?
Br J Gen Pract. 2021 Jun 24;71(708):302. doi: 10.3399/bjgp21X716249. Print 2021 Jul.
3
Gender disparities in coronavirus disease 2019 clinical trial leadership.2019 年冠状病毒病临床试验领导中的性别差异。
Clin Microbiol Infect. 2021 Jul;27(7):1007-1010. doi: 10.1016/j.cmi.2020.12.025. Epub 2021 Jan 5.
4
Sex Distribution of Editorial Board Members Among Emergency Medicine Journals.急诊医学期刊编辑委员会成员的性别分布。
Ann Emerg Med. 2021 Jan;77(1):117-123. doi: 10.1016/j.annemergmed.2020.03.027. Epub 2020 May 4.
5
Sex and gender reporting in global health: new editorial policies.全球健康领域中的性别与性取向报告:新编辑政策
BMJ Glob Health. 2018 Jul 26;3(4):e001038. doi: 10.1136/bmjgh-2018-001038. eCollection 2018.