Mei Xiaoyue, Kabir Hannaneh, Conboy Michael J, Conboy Irina M
Department of Bioengineering and QB3 Institute, UC Berkeley, Berkeley, CA, 94720, USA.
Geroscience. 2025 Jun 25. doi: 10.1007/s11357-025-01750-2.
Biological aging is a complex non-linear process, with markedly distinct starting and end points, yet the biomarkers of its progression remain elusive. A key assumption of most machine learning (ML) approaches for age clocks is that predictive biomedical features can be identified via mathematical transformations of data to favor a linear transition from start to end, even if they erase any natural biological pattern. It is given that expected correlations, e.g., time lived (age) and time left to live (mortality), would persist in such mathematically optimized models, biologically meaningful or not. Here, we further clarify the workings of the clocks, explain the trade-off between mathematical optimization and biological interpretability, and discuss a hallmark of aging, inflammaging, that age clocks struggle to detect. We expand on the negative consequences of incoherence in linear models where some DNA methylation (DNAm) features increase with aging and disease, while others correspondingly decrease, yet positive weights are assigned to both. We quantify the misalignment between major DNAm clocks and actual changes in DNAm, providing an interactive visualization of these errors for each model. We demonstrate that major conventional age clocks are both incoherent and skewed toward leukocyte fractions and that rectifying incoherence makes the model balanced and not skewed toward neutrophils and better detects inflammaging. We briefly outline non-linear ML age clocks and the advantages of identifying a natural trajectory of aging directly from the primary data.
生物衰老 是一个复杂的非线性过程,起点和终点明显不同,但其进展的生物标志物仍然难以捉摸。大多数用于年龄时钟的机器学习(ML)方法的一个关键假设是,可以通过数据的数学变换来识别预测性生物医学特征,以促进从起点到终点的线性转变,即使这些变换会消除任何自然生物模式。在这种经过数学优化的模型中,预期的相关性,例如已活时间(年龄)和剩余寿命(死亡率),无论是否具有生物学意义,都将持续存在。在这里,我们进一步阐明了这些时钟的工作原理,解释了数学优化与生物学可解释性之间的权衡,并讨论了衰老的一个标志——炎症衰老,而年龄时钟很难检测到这一标志。我们详细阐述了线性模型中不一致性的负面后果,在这些模型中,一些DNA甲基化(DNAm)特征随衰老和疾病增加,而其他特征则相应减少,但却对两者都赋予了正权重。我们量化了主要DNAm时钟与DNAm实际变化之间的偏差,为每个模型提供了这些误差的交互式可视化。我们证明,主要的传统年龄时钟既不一致,又偏向白细胞分数,纠正不一致性会使模型变得平衡,不再偏向中性粒细胞,并且能更好地检测炎症衰老。我们简要概述了非线性ML年龄时钟以及直接从原始数据中识别衰老自然轨迹的优势。