Liu Hangfan, Grothe Michel J, Rashid Tanweer, Labrador-Espinosa Miguel A, Toledo Jon B, Habes Mohamad
Neuroimage Analytics Laboratory (NAL) and Biggs Institute Neuroimaging Core, Glenn Biggs Institute for Neurodegenerative Disorders, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA; Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA.
Unidad de Trastornos del Movimiento, Servicio de Neurología y Neurofisiología Clínica, Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC/Universidad de Sevilla, Seville, Spain.
IEEE Trans Emerg Top Comput Intell. 2023 Apr;7(2):308-318. doi: 10.1109/tetci.2021.3136587. Epub 2022 Jan 5.
Conventional clustering techniques for neuroimaging applications usually focus on capturing differences between given subjects, while neglecting arising differences between features and the potential bias caused by degraded data quality. In practice, collected neuroimaging data are often inevitably contaminated by noise, which may lead to errors in clustering and clinical interpretation. Additionally, most methods ignore the importance of feature grouping towards optimal clustering. In this paper, we exploit the underlying heterogeneous clusters of features to serve as weak supervision for improved clustering of subjects, which is achieved by simultaneously clustering subjects and features via nonnegative matrix tri-factorization. In order to suppress noise, we further introduce adaptive regularization based on coefficient distribution modeling. Particularly, unlike conventional sparsity regularization techniques that assume zero mean of the coefficients, we form the distributions using the data of interest so that they could better fit the non-negative coefficients. In this manner, the proposed approach is expected to be more effective and robust against noise. We compared the proposed method with standard techniques and recently published methods demonstrating superior clustering performance on synthetic data with known ground truth labels. Furthermore, when applying our proposed technique to magnetic resonance imaging (MRI) data from a cohort of patients with Parkinson's disease, we identified two stable and highly reproducible patient clusters characterized by frontal and posterior cortical/medial temporal atrophy patterns, respectively, which also showed corresponding differences in cognitive characteristics.
用于神经成像应用的传统聚类技术通常专注于捕捉给定受试者之间的差异,而忽略了特征之间出现的差异以及数据质量下降所导致的潜在偏差。在实际应用中,收集到的神经成像数据往往不可避免地受到噪声污染,这可能导致聚类和临床解释出现错误。此外,大多数方法忽略了特征分组对优化聚类的重要性。在本文中,我们利用特征的潜在异质聚类作为弱监督,以改进受试者的聚类,这是通过非负矩阵三因子分解同时对受试者和特征进行聚类来实现的。为了抑制噪声,我们进一步引入基于系数分布建模的自适应正则化。特别是,与假设系数均值为零的传统稀疏正则化技术不同,我们使用感兴趣的数据形成分布,以便它们能够更好地拟合非负系数。通过这种方式,所提出的方法预计在抗噪声方面更有效且更稳健。我们将所提出的方法与标准技术以及最近发表的方法进行了比较,结果表明该方法在具有已知真实标签的合成数据上具有卓越的聚类性能。此外,当将我们提出的技术应用于一组帕金森病患者的磁共振成像(MRI)数据时,我们识别出两个稳定且高度可重复的患者聚类,分别以前额叶和后皮质/内侧颞叶萎缩模式为特征,并且在认知特征上也表现出相应的差异。