Yoon Sunmoo, Crupi Robert, Sun Frederick, Tipiani Dante, Patterson Melissa, Pottinger Tess, Kim Milea, Davis Ncole
General Medicine, Columbia University, New York, NY.
New York Presbyterian Hospital, New York, NY.
Stud Health Technol Inform. 2025 Apr 8;323:1-5. doi: 10.3233/SHTI250036.
We compared emotional valence scores as determined via machine vs human ratings from a survey conducted from April to May 2024 on perceived attitudes on the use of artificial intelligence (AI) for African American family caregivers of persons with Alzheimer's disease and related dementias (ADRD) (N=627). The participants answered risks, benefits and possible solutions qualitatively on the open-ended questions on ten AI use cases, followed by a rating of each. Then, we applied three machine learning algorithms to detect emotional valence scores from the text data and compared their mean to the human ratings. The mean emotional valence scores from text data via natural language processing (NLP) were negative regardless of algorithms (AFINN: -1.61 ± 2.76, Bing: -1.40 ± 1.52, and Syuzhet: -0.67 ± 1.14), while the mean score of human ratings was positive (2.30 ± 1.48, p=0.0001). Our findings have implications for the practice of survey design using self-rated instruments and open-ended questions in an NLP era.
我们比较了通过机器评分与人工评分得出的情感效价分数。这些评分来自于2024年4月至5月针对非裔美籍阿尔茨海默病及相关痴呆症(ADRD)患者家庭护理人员对人工智能(AI)使用的感知态度所开展的一项调查(N = 627)。参与者针对十个AI用例的开放式问题,定性回答了风险、益处及可能的解决方案,随后对每个问题进行了评分。然后,我们应用三种机器学习算法从文本数据中检测情感效价分数,并将其平均值与人工评分进行比较。通过自然语言处理(NLP)从文本数据得出的情感效价分数平均值无论采用何种算法均为负(AFINN:-1.61 ± 2.76,Bing:-1.40 ± 1.52,Syuzhet:-0.67 ± 1.14),而人工评分的平均值为正(2.30 ± 1.48,p = 0.0001)。我们的研究结果对在NLP时代使用自评工具和开放式问题进行调查设计的实践具有启示意义。