Simchon Almog, Sutton Adam, Edwards Matthew, Lewandowsky Stephan
School of Psychological Science, University of Bristol, Bristol BS8 1QU, UK.
Department of Computer Science, University of Bristol, Bristol BS8 1QU, UK.
PNAS Nexus. 2023 Jun 7;2(6):pgad191. doi: 10.1093/pnasnexus/pgad191. eCollection 2023 Jun.
Building on big data from , we generated two computational text models: (i) Predicting the personality of users from the text they have written and (ii) predicting the personality of users based on the text they have consumed. The second model is novel and without precedent in the literature. We recruited active Reddit users () of fiction-writing communities. The participants completed a Big Five personality questionnaire and consented for their Reddit activity to be scraped and used to create a machine learning model. We trained an natural language processing model [Bidirectional Encoder Representations from Transformers (BERT)], predicting personality from produced text (average performance: ). We then applied this model to a new set of Reddit users (), predicted their personality based on their produced text, and trained a second BERT model to predict their predicted-personality scores based on consumed text (average performance: ). By doing so, we provide the first glimpse into the linguistic markers of personality-congruent consumed content.
基于来自……的大数据,我们生成了两个计算文本模型:(i)根据用户所写文本预测其个性,以及(ii)根据用户所消费的文本预测其个性。第二个模型是新颖的,在文献中没有先例。我们招募了Reddit上小说写作社区的活跃用户(……)。参与者完成了一份大五人格问卷,并同意对他们在Reddit上的活动进行抓取,用于创建一个机器学习模型。我们训练了一个自然语言处理模型[来自变换器的双向编码器表示(BERT)],根据所生成的文本预测个性(平均性能:……)。然后,我们将这个模型应用于一组新的Reddit用户(……),根据他们所生成的文本预测他们的个性,并训练第二个BERT模型,根据所消费的文本预测他们的预测个性分数(平均性能:……)。通过这样做,我们首次初步了解了与个性相符的消费内容的语言标记。