Ngai Chun Hau, Kilpatrick Alexander J, Ćwiek Aleksandra
East Asian Languages and Cultures Department, Indiana University, Bloomington, Indiana, United States of America.
Faculty of International Studies, Nagoya University of Business and Commerce, Nisshin, Aichi, Japan.
PLoS One. 2024 Mar 11;19(3):e0297440. doi: 10.1371/journal.pone.0297440. eCollection 2024.
This study investigates the sound symbolic expressions of gender in Japanese names with machine learning algorithms. The main goal of this study is to explore how gender is expressed in the phonemes that make up Japanese names and whether systematic sound-meaning mappings, observed in Indo-European languages, extend to Japanese. In addition to this, this study compares the performance of machine learning algorithms. Random Forest and XGBoost algorithms are trained using the sounds of names and the typical gender of the referents as the dependent variable. Each algorithm is cross-validated using k-fold cross-validation (28 folds) and tested on samples not included in the training cycle. Both algorithms are shown to be reasonably accurate at classifying names into gender categories; however, the XGBoost model performs significantly better than the Random Forest algorithm. Feature importance scores reveal that certain sounds carry gender information. Namely, the voiced bilabial nasal /m/ and voiceless velar consonant /k/ were associated with femininity, and the high front vowel /i/ were associated with masculinity. The association observed for /i/ and /k/ stand contrary to typical patterns found in other languages, suggesting that Japanese is unique in the sound symbolic expression of gender. This study highlights the importance of considering cultural and linguistic nuances in sound symbolism research and underscores the advantage of XGBoost in capturing complex relationships within the data for improved classification accuracy. These findings contribute to the understanding of sound symbolism and gender associations in language.
本研究运用机器学习算法探究日语名字中性别方面的语音象征表达。本研究的主要目标是探讨性别如何在构成日语名字的音素中得以体现,以及在印欧语系中观察到的系统音义映射是否也适用于日语。除此之外,本研究还比较了机器学习算法的性能。使用名字的发音以及所指对象的典型性别作为因变量,对随机森林和XGBoost算法进行训练。每种算法都采用k折交叉验证(28折)进行交叉验证,并在训练周期中未包含的样本上进行测试。结果表明,两种算法在将名字分类为性别类别方面都具有相当的准确性;然而,XGBoost模型的表现明显优于随机森林算法。特征重要性得分显示某些音带有性别信息。具体而言,浊双唇鼻音/m/和清软腭辅音/k/与女性气质相关,而高前元音/i/与男性气质相关。对于/i/和/k/观察到的关联与其他语言中发现的典型模式相反,这表明日语在性别语音象征表达方面具有独特性。本研究强调了在语音象征研究中考虑文化和语言细微差别的重要性,并突出了XGBoost在捕捉数据内复杂关系以提高分类准确性方面的优势。这些发现有助于理解语言中的语音象征和性别关联。