Khadilkar Kunal, KhudaBukhsh Ashiqur R, Mitchell Tom M
School of Computer Science, Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.
Golisano College of Computing and Information Sciences, Software Engineering Department, Rochester Institute of Technology, 20 Lomb Memorial Drive, New York, NY 14623, USA.
Patterns (N Y). 2021 Dec 9;3(2):100409. doi: 10.1016/j.patter.2021.100409. eCollection 2022 Feb 11.
We use a suite of cutting-edge natural language processing methods to quantify and characterize societal and gender biases in popular movie content. Our data set consists of English subtitles of popular movies from Bollywood-the Mumbai film industry-spanning 7 decades (700 movies). In addition, we include movies from Hollywood and movies nominated for the Academy Awards for contrastive purposes. Our findings indicate that while the overall portrayal of women has improved over time in popular movie dialogues from both Bollywood and Hollywood, modern films still exhibit considerable gender bias and are yet to achieve equal representation among genders. We also observe a strong bias favoring fair skin color in Bollywood content that occurred consistently across all time periods we considered. While our geographic representation analysis indicates improved inclusion over time for several Indian states, it also reveals a long-standing under-representation of many northeastern Indian states.
我们使用一套前沿的自然语言处理方法,对热门电影内容中的社会和性别偏见进行量化和特征描述。我们的数据集由宝莱坞(孟买电影产业)七十年间(700部电影)热门电影的英文字幕组成。此外,为了进行对比,我们还纳入了好莱坞电影以及获得奥斯卡奖提名的电影。我们的研究结果表明,虽然宝莱坞和好莱坞热门电影对白中对女性的整体刻画随着时间推移有所改善,但现代电影仍存在相当程度的性别偏见,尚未实现性别平等的呈现。我们还观察到,在我们所考虑的所有时间段内,宝莱坞内容中始终存在对白皙肤色的强烈偏好。虽然我们的地理代表性分析表明,随着时间推移,印度几个邦的代表性有所提高,但它也揭示了印度许多东北部邦长期以来代表性不足的问题。