Suppr超能文献

用于超高维中介分析的去偏机器学习

Debiased machine learning for ultra-high dimensional mediation analysis.

作者信息

Wei Kecheng, Liu Yahang, Huang Chen, Lin Ruilang, Yu Yongfu, Qin Guoyou

机构信息

Department of Biostatistics, School of Public Health, Fudan University, Shanghai 200032, China.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf282.

Abstract

MOTIVATION

In ultra-high dimensional mediation analysis, confounding variables can influence both mediators and outcomes through complex functional forms. While machine learning (ML) approaches are effective at modeling such complex relationships, they can introduce bias when estimating mediation effects. In this article, we propose a debiased ML framework that mitigates this bias, enabling accurate identification of key mediators and precise estimation and inference of their respective contributions.

RESULTS

We construct an orthogonalized score function and use cross-fitting to reduce bias introduced by ML. To tackle ultra-high dimensional potential mediators, we implement screening and regularization techniques for variable selection and effect estimation. For statistical inference of the mediators' contributions, we use an adjusted Sobel-type test. Simulation results demonstrate the superior performance of the proposed method in handling complex confounding. Applying this method to Alzheimer's Disease Neuroimaging Initiative data, we identify several cytosine-phosphate-guanine sites where DNA methylation mediates the effect of body mass index on Alzheimer's Disease.

AVAILABILITY AND IMPLEMENTATION

The R function DML_HDMA implementing the proposed methods is available online at https://github.com/Wei-Kecheng/DML_HDMA.

摘要

动机

在超高维中介分析中,混杂变量可以通过复杂的函数形式影响中介变量和结果变量。虽然机器学习(ML)方法在对这种复杂关系进行建模时很有效,但在估计中介效应时可能会引入偏差。在本文中,我们提出了一个去偏机器学习框架,以减轻这种偏差,从而能够准确识别关键中介变量,并精确估计和推断它们各自的贡献。

结果

我们构建了一个正交化得分函数,并使用交叉拟合来减少机器学习引入的偏差。为了处理超高维潜在中介变量,我们实施了筛选和正则化技术进行变量选择和效应估计。对于中介变量贡献的统计推断,我们使用了一种调整后的Sobel型检验。模拟结果证明了所提出方法在处理复杂混杂因素方面的优越性能。将该方法应用于阿尔茨海默病神经影像倡议数据,我们识别出了几个胞嘧啶-磷酸-鸟嘌呤位点,其中DNA甲基化介导了体重指数对阿尔茨海默病的影响。

可用性和实现

实现所提出方法的R函数DML_HDMA可在https://github.com/Wei-Kecheng/DML_HDMA上在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验