Suppr超能文献

从名字推断性别:比较Genderize、性别API和性别R包对不同国籍作者的识别准确率。

Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality.

作者信息

VanHelene Alexander D, Khatri Ishaani, Hilton C Beau, Mishra Sanjay, Gamsiz Uzun Ece D, Warner Jeremy L

机构信息

Lifespan Cancer Institute, Rhode Island Hospital, Providence, Rhode Island, United States of America.

Center for Clinical Cancer Informatics and Data Science, Legorreta Cancer Center, Brown University, Providence, Rhode Island.

出版信息

PLOS Digit Health. 2024 Oct 29;3(10):e0000456. doi: 10.1371/journal.pdig.0000456. eCollection 2024 Oct.

Abstract

Meta-researchers commonly leverage tools that infer gender from first names, especially when studying gender disparities. However, tools vary in their accuracy, ease of use, and cost. The objective of this study was to compare the accuracy and cost of the commercial software Genderize and Gender API, and the open-source gender R package. Differences in binary gender prediction accuracy between the three services were evaluated. Gender prediction accuracy was tested on a multi-national dataset of 32,968 gender-labeled clinical trial authors. Additionally, two datasets from previous studies with 5779 and 6131 names, respectively, were re-evaluated with modern implementations of Genderize and Gender API. The gender inference accuracy of Genderize and Gender API were compared, both with and without supplying trialists' country of origin in the API call. The accuracy of the gender R package was only evaluated without supplying countries of origin. The accuracy of Genderize, Gender API, and the gender R package were defined as the percentage of correct gender predictions. Accuracy differences between methods were evaluated using McNemar's test. Genderize and Gender API demonstrated 96.6% and 96.1% accuracy, respectively, when countries of origin were not supplied in the API calls. Genderize and Gender API achieved the highest accuracy when predicting the gender of German authors with accuracies greater than 98%. Genderize and Gender API were least accurate with South Korean, Chinese, Singaporean, and Taiwanese authors, demonstrating below 82% accuracy. Genderize can provide similar accuracy to Gender API while being 4.85x less expensive. The gender R package achieved below 86% accuracy on the full dataset. In the replication studies, Genderize and gender API demonstrated better performance than in the original publications. Our results indicate that Genderize and Gender API achieve similar accuracy on a multinational dataset. The gender R package is uniformly less accurate than Genderize and Gender API.

摘要

元研究人员通常会利用根据名字推断性别的工具,尤其是在研究性别差异时。然而,这些工具在准确性、易用性和成本方面存在差异。本研究的目的是比较商业软件Genderize和Gender API以及开源性别R包的准确性和成本。评估了这三种服务在二元性别预测准确性上的差异。在一个包含32968名有性别标注的临床试验作者的跨国数据集中测试了性别预测准确性。此外,分别用Genderize和Gender API的现代版本对之前两项研究中的两个数据集(分别有5779个和6131个名字)进行了重新评估。比较了在API调用中提供和不提供试验者原籍国两种情况下Genderize和Gender API的性别推断准确性。仅在不提供原籍国的情况下评估了性别R包的准确性。将Genderize、Gender API和性别R包的准确性定义为正确性别预测的百分比。使用McNemar检验评估方法之间的准确性差异。当在API调用中不提供原籍国时,Genderize和Gender API的准确率分别为96.6%和96.1%。在预测德国作者的性别时,Genderize和Gender API的准确率最高,超过98%。对于韩国、中国、新加坡和中国台湾地区的作者,Genderize和Gender API的准确率最低,低于82%。Genderize能提供与Gender API相似的准确性,但其成本低4.85倍。在完整数据集中,性别R包的准确率低于86%。在复制研究中,Genderize和Gender API的表现比原始出版物中的更好。我们的结果表明,在跨国数据集中,Genderize和Gender API的准确性相似。性别R包的准确性始终低于Genderize和Gender API。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8816/11521266/258e608a9513/pdig.0000456.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验