Douglas Mental Health University Institute, Montreal, QC, Canada.
McGill University, Montreal, QC, Canada.
Transl Psychiatry. 2024 Jun 21;14(1):263. doi: 10.1038/s41398-024-02970-4.
Major depressive disorder (MDD) is the leading cause of disability worldwide, yet treatment selection still proceeds via "trial and error". Given the varied presentation of MDD and heterogeneity of treatment response, the use of machine learning to understand complex, non-linear relationships in data may be key for treatment personalization. Well-organized, structured data from clinical trials with standardized outcome measures is useful for training machine learning models; however, combining data across trials poses numerous challenges. There is also persistent concern that machine learning models can propagate harmful biases. We have created a methodology for organizing and preprocessing depression clinical trial data such that transformed variables harmonized across disparate datasets can be used as input for feature selection. Using Bayesian optimization, we identified an optimal multi-layer dense neural network that used data from 21 clinical and sociodemographic features as input in order to perform differential treatment benefit prediction. With this combined dataset of 5032 individuals and 6 drugs, we created a differential treatment benefit prediction model. Our model generalized well to the held-out test set and produced similar accuracy metrics in the test and validation set with an AUC of 0.7 when predicting binary remission. To address the potential for bias propagation, we used a bias testing performance metric to evaluate the model for harmful biases related to ethnicity, age, or sex. We present a full pipeline from data preprocessing to model validation that was employed to create the first differential treatment benefit prediction model for MDD containing 6 treatment options.
重度抑郁症(MDD)是全球范围内导致残疾的主要原因,但治疗选择仍然是“反复试验”。鉴于 MDD 的表现多样和治疗反应的异质性,使用机器学习来理解数据中的复杂、非线性关系可能是实现治疗个体化的关键。来自临床试验的组织良好、结构化的数据,具有标准化的结局测量,可用于训练机器学习模型;然而,跨试验合并数据会带来许多挑战。人们仍然担心机器学习模型可能会传播有害偏见。我们创建了一种组织和预处理抑郁临床试验数据的方法,以便跨不同数据集协调转换变量,从而将其用作特征选择的输入。使用贝叶斯优化,我们确定了一个最佳的多层密集神经网络,该网络使用来自 21 个临床和社会人口学特征的数据作为输入,以进行差异治疗获益预测。利用这个包含 5032 个人和 6 种药物的综合数据集,我们创建了一个差异治疗获益预测模型。我们的模型在验证集中很好地泛化,并在测试集和验证集上产生了类似的准确率指标,在预测二进制缓解时 AUC 为 0.7。为了解决潜在的偏见传播问题,我们使用了偏见测试性能指标来评估模型是否存在与种族、年龄或性别相关的有害偏见。我们提出了一个从数据预处理到模型验证的完整管道,该管道用于创建第一个包含 6 种治疗选择的 MDD 差异治疗获益预测模型。