Smith Kenneth, Climer Sharlee
Department of Computer Science, University of Missouri - St. Louis, St. Louis, MO, United States.
Front Comput Neurosci. 2024 Sep 3;18:1388504. doi: 10.3389/fncom.2024.1388504. eCollection 2024.
Late-onset Alzheimer disease (AD) is a highly complex disease with multiple subtypes, as demonstrated by its disparate risk factors, pathological manifestations, and clinical traits. Discovery of biomarkers to diagnose specific AD subtypes is a key step towards understanding biological mechanisms underlying this enigmatic disease, generating candidate drug targets, and selecting participants for drug trials. Popular statistical methods for evaluating candidate biomarkers, fold change (FC) and area under the receiver operating characteristic curve (AUC), were designed for homogeneous data and we demonstrate the inherent weaknesses of these approaches when used to evaluate subtypes representing less than half of the diseased cases. We introduce a unique evaluation metric that is based on the distribution of the values, rather than the magnitude of the values, to identify analytes that are associated with a subset of the diseased cases, thereby revealing potential biomarkers for subtypes. Our approach, Bimodality Coefficient Difference (BCD), computes the difference between the degrees of bimodality for the cases and controls. We demonstrate the effectiveness of our approach with large-scale synthetic data trials containing nearly perfect subtypes. In order to reveal novel AD biomarkers for heterogeneous subtypes, we applied BCD to gene expression data for 8,650 genes for 176 AD cases and 187 controls. Our results confirm the utility of BCD for identifying subtypes of heterogeneous diseases.
迟发性阿尔茨海默病(AD)是一种高度复杂的疾病,具有多种亚型,其不同的风险因素、病理表现和临床特征都证明了这一点。发现用于诊断特定AD亚型的生物标志物是理解这种神秘疾病潜在生物学机制、生成候选药物靶点以及为药物试验选择参与者的关键一步。用于评估候选生物标志物的常用统计方法,即倍数变化(FC)和受试者工作特征曲线下面积(AUC),是为同质数据设计的,并且我们证明了这些方法在用于评估占患病病例不到一半的亚型时存在固有弱点。我们引入了一种独特的评估指标,该指标基于值的分布而非值的大小,以识别与一部分患病病例相关的分析物,从而揭示亚型的潜在生物标志物。我们的方法,双峰系数差异(BCD),计算病例组和对照组双峰程度之间的差异。我们通过包含近乎完美亚型的大规模合成数据试验证明了我们方法的有效性。为了揭示异质亚型的新型AD生物标志物,我们将BCD应用于176例AD病例和187例对照的8650个基因的基因表达数据。我们的结果证实了BCD在识别异质性疾病亚型方面的实用性。