McDonnell Michelle, Owen Jason Edward, Bantum Erin O'Carroll
Veteran's Affairs Loma Linda Healthcare System, Loma Linda, CA, United States.
US Department of Veterans Affairs, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, United States.
JMIR Form Res. 2020 Oct 30;4(10):e18246. doi: 10.2196/18246.
Given the high volume of text-based communication such as email, Facebook, Twitter, and additional web-based and mobile apps, there are unique opportunities to use text to better understand underlying psychological constructs such as emotion. Emotion recognition in text is critical to commercial enterprises (eg, understanding the valence of customer reviews) and to current and emerging clinical applications (eg, as markers of clinical progress and risk of suicide), and the Linguistic Inquiry and Word Count (LIWC) is a commonly used program.
Given the wide use of this program, the purpose of this study is to update previous validation results with two newer versions of LIWC.
Tests of proportions were conducted using the total number of emotion words identified by human coders for each emotional category as the reference group. In addition to tests of proportions, we calculated F scores to evaluate the accuracy of LIWC 2001, LIWC 2007, and LIWC 2015.
Results indicate that LIWC 2001, LIWC 2007, and LIWC 2015 each demonstrate good sensitivity for identifying emotional expression, whereas LIWC 2007 and LIWC 2015 were significantly more sensitive than LIWC 2001 for identifying emotional expression and positive emotion; however, more recent versions of LIWC were also significantly more likely to overidentify emotional content than LIWC 2001. LIWC 2001 demonstrated significantly better precision (F score) for identifying overall emotion, negative emotion, and anxiety compared with LIWC 2007 and LIWC 2015.
Taken together, these results suggest that LIWC 2001 most accurately reflects the emotional identification of human coders.
鉴于诸如电子邮件、脸书、推特以及其他基于网络和移动应用的大量文本交流,利用文本能更好地理解诸如情感等潜在心理结构,存在独特的机会。文本中的情感识别对商业企业(例如理解客户评论的效价)以及当前和新兴的临床应用(例如作为临床进展和自杀风险的标志物)至关重要,语言查询与字数统计(LIWC)是一个常用程序。
鉴于该程序的广泛使用,本研究的目的是用LIWC的两个较新版本更新先前的验证结果。
以人工编码员为每个情感类别识别出的情感词总数作为参照组,进行比例检验。除了比例检验,我们还计算了F分数以评估LIWC 2001、LIWC 2007和LIWC 2015的准确性。
结果表明,LIWC 2001、LIWC 2007和LIWC 2015在识别情感表达方面均表现出良好的敏感性,而LIWC 2007和LIWC 2015在识别情感表达和积极情感方面比LIWC 2001显著更敏感;然而,LIWC的较新版本在识别情感内容时也比LIWC 2001更显著地容易过度识别。与LIWC 2007和LIWC 2015相比,LIWC 2001在识别总体情感、消极情感和焦虑方面表现出显著更高的精确性(F分数)。
综合来看,这些结果表明LIWC 2001最准确地反映了人工编码员的情感识别。