Department of Mathematics and Computer Science, Faculty of Science and Technology, Prince of Songkla University, Pattani, Thailand.
PLoS One. 2020 Jun 4;15(6):e0233746. doi: 10.1371/journal.pone.0233746. eCollection 2020.
Discrimination in the workplace is illegal, yet discriminatory practices remain a persistent global problem. To identify discriminatory practices in the workplace, job advertisement analysis was used by previous studies. However, most of those studies adopted content analysis by manually coding the text from a limited number of samples since working with a large scale of job advertisements consisting of unstructured text data is very challenging. Encountering those limitations, the present study involves text mining techniques to identify multiple types of direct discrimination on a large scale of online job advertisements by designing a method called Direct Discrimination Detection (DDD). The DDD is constructed using a combination of N-grams and regular expressions (regex) with the exact match principle of a Boolean retrieval model. A total of 8,969 online job advertisements in English and Bahasa Indonesia, published from May 2005 to December 2017 were collected from bursakerja-jateng.com as the data. The results reveal that the practices of direct discrimination still exist during the job-hunting process including gender, marital status, physical appearances, and religion. The most recurrent type of discrimination which occurs in job advertisements is based on age (66.27%), followed by gender (38.76%), and physical appearances (18.42%). Additionally, female job seekers are found as the most vulnerable party to experience direct discrimination during recruitment. The results exhibit female job seekers face complex jeopardy in particular job positions comparing to their male counterparts. Not only excluded because of their gender, but female job seekers also had to fulfil more requirements for getting an opportunity to apply for the jobs such as being single, still at a young age, complying specific physical appearances and particular religious preferences. This study illustrates the power and potential of optimizing computational methods on a large scale of unstructured text data to analyze phenomena in the social field.
工作场所的歧视是非法的,但歧视性做法仍然是一个普遍存在的全球问题。为了识别工作场所的歧视性做法,以前的研究使用了工作广告分析。然而,由于处理包含非结构化文本数据的大规模工作广告非常具有挑战性,大多数此类研究采用了通过手动对有限数量样本的文本进行编码的内容分析。为了克服这些局限性,本研究通过设计一种称为直接歧视检测(DDD)的方法,使用文本挖掘技术来大规模识别多种类型的直接歧视。DDD 是使用 N 元组和正则表达式(regex)构建的,结合了布尔检索模型的精确匹配原理。从 2005 年 5 月至 2017 年 12 月,总共从 bursakerja-jateng.com 收集了 8969 份英文和印尼文的在线工作广告。结果表明,在求职过程中仍然存在直接歧视行为,包括性别、婚姻状况、外貌和宗教。工作广告中最常见的歧视类型是基于年龄(66.27%),其次是性别(38.76%)和外貌(18.42%)。此外,女性求职者在招聘过程中被发现是最容易受到直接歧视的群体。结果表明,女性求职者在特定工作岗位上面临着比男性更复杂的危险。她们不仅因为性别而被排斥,而且女性求职者还必须满足更多的要求才有机会申请工作,例如单身、年轻、符合特定的外貌和特定的宗教偏好。本研究展示了在大规模非结构化文本数据上优化计算方法的力量和潜力,以分析社会领域的现象。