Paschali Magdalini, Jiang Yu Hang, Siegel Spencer, Gonzalez Camila, Pohl Kilian M, Chaudhari Akshay, Zhao Qingyu
Department of Radiology, Stanford University, Stanford, CA, USA.
Department of Statistics, Stanford University, Stanford, USA.
Predict Intell Med. 2025;15155:24-34. doi: 10.1007/978-3-031-74561-4_3. Epub 2024 Oct 18.
Recent advancements in medicine have confirmed that brain disorders often comprise multiple subtypes of mechanisms, developmental trajectories, or severity levels. Such heterogeneity is often associated with demographic aspects (e.g., sex) or disease-related contributors (e.g., genetics). Thus, the predictive power of machine learning models used for symptom prediction varies across subjects based on such factors. To model this heterogeneity, one can assign each training sample a factor-dependent weight, which modulates the subject's contribution to the overall objective loss function. To this end, we propose to model the subject weights as a linear combination of the eigenbases of a spectral population graph that captures the similarity of factors across subjects. In doing so, the learned weights smoothly vary across the graph, highlighting sub-cohorts with high and low predictability. Our proposed sample weighting scheme is evaluated on two tasks. First, we predict initiation of heavy alcohol drinking in young adulthood from imaging and neuropsychological measures from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA). Next, we detect Dementia . Mild Cognitive Impairment (MCI) using imaging and demographic measurements in subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Compared to existing sample weighting schemes, our sample weights improve interpretability and highlight sub-cohorts with distinct characteristics and varying model accuracy.
医学领域的最新进展证实,脑部疾病通常包含多种机制、发育轨迹或严重程度的亚型。这种异质性通常与人口统计学因素(如性别)或疾病相关因素(如遗传学)有关。因此,用于症状预测的机器学习模型的预测能力会因这些因素在不同个体间有所差异。为了对这种异质性进行建模,可以为每个训练样本分配一个依赖于因素的权重,该权重会调节个体对整体目标损失函数的贡献。为此,我们建议将个体权重建模为一个谱总体图的特征基的线性组合,该图捕捉了个体间因素的相似性。这样做时,学习到的权重会在图上平滑变化,突出显示具有高预测性和低预测性的亚组。我们提出的样本加权方案在两项任务上进行了评估。首先,我们根据青少年酒精与神经发育全国联盟(NCANDA)的成像和神经心理学测量来预测青年期重度饮酒的开始情况。其次,我们利用阿尔茨海默病神经成像倡议(ADNI)中受试者的成像和人口统计学测量来检测痴呆症和轻度认知障碍(MCI)。与现有的样本加权方案相比,我们的样本权重提高了可解释性,并突出显示了具有不同特征和不同模型准确性的亚组。