Suppr超能文献

高维数据的一种新范式:基于对象间属性的基于距离的半参数特征聚合框架

A New Paradigm for High-dimensional Data: Distance-Based Semiparametric Feature Aggregation Framework via Between-Subject Attributes.

作者信息

Liu Jinyuan, Zhang Xinlian, Lin Tuo, Chen Ruohui, Zhong Yuan, Chen Tian, Wu Tsungchin, Liu Chenyu, Huang Anna, Nguyen Tanya T, Lee Ellen E, Jeste Dilip V, Tu Xin M

机构信息

Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, U.S.A.

Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.

出版信息

Scand Stat Theory Appl. 2024 Jun;51(2):672-696. doi: 10.1111/sjos.12695. Epub 2023 Nov 8.

Abstract

This article proposes a distance-based framework incentivized by the paradigm shift towards feature aggregation for high-dimensional data, which does not rely on the sparse-feature assumption or the permutation-based inference. Focusing on distance-based outcomes that preserve information without truncating any features, a class of semiparametric regression has been developed, which encapsulates multiple sources of high-dimensional variables using pairwise outcomes of between-subject attributes. Further, we propose a strategy to address the interlocking correlations among pairs via the U-statistics-based estimating equations (UGEE), which correspond to their unique efficient influence function (EIF). Hence, the resulting semiparametric estimators are robust to distributional misspecification while enjoying root-n consistency and asymptotic optimality to facilitate inference. In essence, the proposed approach not only circumvents information loss due to feature selection but also improves the model's interpretability and computational feasibility. Simulation studies and applications to the human microbiome and wearables data are provided, where the feature dimensions are tens of thousands.

摘要

本文提出了一种基于距离的框架,该框架受到高维数据向特征聚合范式转变的激励,不依赖于稀疏特征假设或基于排列的推断。专注于在不截断任何特征的情况下保留信息的基于距离的结果,开发了一类半参数回归,它使用个体间属性的成对结果封装高维变量的多个来源。此外,我们提出了一种通过基于U统计量的估计方程(UGEE)来处理成对之间相互关联的策略,这些方程对应于它们独特的有效影响函数(EIF)。因此,所得的半参数估计量对分布错误设定具有鲁棒性,同时具有根n一致性和渐近最优性,便于进行推断。本质上,所提出的方法不仅避免了由于特征选择导致的信息损失,还提高了模型的可解释性和计算可行性。提供了对人类微生物组和可穿戴设备数据的模拟研究及应用,其中特征维度达数万。

相似文献

5
Joint semiparametric kernel network regression.联合半参数核网络回归
Stat Med. 2023 Dec 10;42(28):5247-5265. doi: 10.1002/sim.9910. Epub 2023 Sep 19.

本文引用的文献

7
Cerebellar-Prefrontal Network Connectivity and Negative Symptoms in Schizophrenia.小脑-前额叶网络连接与精神分裂症的阴性症状。
Am J Psychiatry. 2019 Jul 1;176(7):512-520. doi: 10.1176/appi.ajp.2018.18040429. Epub 2019 Jan 30.
10
The Rise of Consumer Health Wearables: Promises and Barriers.消费级健康可穿戴设备的崛起:前景与障碍
PLoS Med. 2016 Feb 2;13(2):e1001953. doi: 10.1371/journal.pmed.1001953. eCollection 2016 Feb.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验