Park Jinkyung, Arunachalam Ramanathan, Silenzio Vincent, Singh Vivek K
School of Communication & Information, Rutgers University, New Brunswick, NJ, United States.
Department of Computer Science, Rutgers University, New Brunswick, NJ, United States.
JMIR Form Res. 2022 Jun 14;6(6):e34366. doi: 10.2196/34366.
Approximately 1 in 5 American adults experience mental illness every year. Thus, mobile phone-based mental health prediction apps that use phone data and artificial intelligence techniques for mental health assessment have become increasingly important and are being rapidly developed. At the same time, multiple artificial intelligence-related technologies (eg, face recognition and search results) have recently been reported to be biased regarding age, gender, and race. This study moves this discussion to a new domain: phone-based mental health assessment algorithms. It is important to ensure that such algorithms do not contribute to gender disparities through biased predictions across gender groups.
This research aimed to analyze the susceptibility of multiple commonly used machine learning approaches for gender bias in mobile mental health assessment and explore the use of an algorithmic disparate impact remover (DIR) approach to reduce bias levels while maintaining high accuracy.
First, we performed preprocessing and model training using the data set (N=55) obtained from a previous study. Accuracy levels and differences in accuracy across genders were computed using 5 different machine learning models. We selected the random forest model, which yielded the highest accuracy, for a more detailed audit and computed multiple metrics that are commonly used for fairness in the machine learning literature. Finally, we applied the DIR approach to reduce bias in the mental health assessment algorithm.
The highest observed accuracy for the mental health assessment was 78.57%. Although this accuracy level raises optimism, the audit based on gender revealed that the performance of the algorithm was statistically significantly different between the male and female groups (eg, difference in accuracy across genders was 15.85%; P<.001). Similar trends were obtained for other fairness metrics. This disparity in performance was found to reduce significantly after the application of the DIR approach by adapting the data used for modeling (eg, the difference in accuracy across genders was 1.66%, and the reduction is statistically significant with P<.001).
This study grounds the need for algorithmic auditing in phone-based mental health assessment algorithms and the use of gender as a protected attribute to study fairness in such settings. Such audits and remedial steps are the building blocks for the widespread adoption of fair and accurate mental health assessment algorithms in the future.
每年约五分之一的美国成年人患有精神疾病。因此,利用手机数据和人工智能技术进行心理健康评估的基于手机的心理健康预测应用程序变得越来越重要,并且正在迅速发展。与此同时,最近有报道称,多种与人工智能相关的技术(如人脸识别和搜索结果)在年龄、性别和种族方面存在偏差。本研究将这一讨论引入一个新领域:基于手机的心理健康评估算法。确保此类算法不会因跨性别群体的偏差预测而加剧性别差异非常重要。
本研究旨在分析多种常用机器学习方法在移动心理健康评估中对性别偏差的敏感性,并探索使用算法差异影响消除器(DIR)方法在保持高精度的同时降低偏差水平。
首先,我们使用从先前研究中获得的数据集(N = 55)进行预处理和模型训练。使用5种不同的机器学习模型计算准确率水平以及不同性别之间的准确率差异。我们选择了准确率最高的随机森林模型进行更详细的审查,并计算了机器学习文献中常用的多个公平性指标。最后,我们应用DIR方法来减少心理健康评估算法中的偏差。
心理健康评估观察到的最高准确率为78.57%。尽管这一准确率水平令人乐观,但基于性别的审查表明,算法在男性和女性群体之间的表现存在统计学上的显著差异(例如,不同性别之间的准确率差异为15.85%;P <.001)。其他公平性指标也呈现类似趋势。通过调整用于建模的数据,发现应用DIR方法后,这种性能差异显著降低(例如,不同性别之间的准确率差异为1.66%,且降低具有统计学意义,P <.001)。
本研究强调了在基于手机的心理健康评估算法中进行算法审查以及将性别作为受保护属性来研究此类环境中公平性的必要性。此类审查和补救措施是未来广泛采用公平准确的心理健康评估算法的基石。