Suppr超能文献

姓名到性别的推理服务的比较与基准测试

Comparison and benchmark of name-to-gender inference services.

作者信息

Santamaría Lucía, Mihaljević Helena

机构信息

Amazon Development Center, Berlin, Germany.

University of Applied Sciences, Berlin, Germany.

出版信息

PeerJ Comput Sci. 2018 Jul 16;4:e156. doi: 10.7717/peerj-cs.156. eCollection 2018.

Abstract

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person's gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services' free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.

摘要

对科技、媒体和学术界性别不平等现象进行分析和解释的兴趣日益浓厚,这凸显了使用准确推理方法从名字预测一个人性别的必要性。有几种这样的服务,它们可以访问大型名字数据库,这些数据库通常还丰富了来自社交媒体资料、特定文化规则和社会语言学见解的信息。我们通过将五种名字到性别的推理服务应用于由7076个手动标注名字组成的测试数据集的分类,对它们进行比较和基准测试。对汇编的名字根据其地理和文化起源进行分析和特征描述。我们定义了一系列性能指标来量化各种类型的分类错误,并定义了一个参数调整程序来搜索服务自由参数的最优值。最后,我们针对几个要优化特定指标的场景,对所有研究中的服务进行基准测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbc/7924484/8fed5609adfd/peerj-cs-04-156-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验