• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

回归系数聚类中的融合套索方法——数据整合中的学习参数异质性

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.

作者信息

Tang Lu, Song Peter X K

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

J Mach Learn Res. 2016;17.

PMID:29056876
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5647925/
Abstract

As data sets of related studies become more easily accessible, combining data sets of similar studies is often undertaken in practice to achieve a larger sample size and higher power. A major challenge arising from data integration pertains to data heterogeneity in terms of study population, study design, or study coordination. Ignoring such heterogeneity in data analysis may result in biased estimation and misleading inference. Traditional techniques of remedy to data heterogeneity include the use of interactions and random effects, which are inferior to achieving desirable statistical power or providing a meaningful interpretation, especially when a large number of smaller data sets are combined. In this paper, we propose a regularized fusion method that allows us to identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach. Using the fused lasso, we establish a computationally efficient procedure to deal with large-scale integrated data. Incorporating the estimated parameter ordering in the fused lasso facilitates computing speed with no loss of statistical power. We conduct extensive simulation studies and provide an application example to demonstrate the performance of the new method with a comparison to the conventional methods.

摘要

随着相关研究数据集变得更容易获取,在实践中常常会合并相似研究的数据集以获得更大的样本量和更高的检验效能。数据整合带来的一个主要挑战涉及到研究人群、研究设计或研究协调方面的数据异质性。在数据分析中忽略这种异质性可能会导致有偏差的估计和误导性的推断。传统的数据异质性补救技术包括使用交互作用和随机效应,但这些方法在实现理想的统计效能或提供有意义的解释方面效果欠佳,尤其是在合并大量较小的数据集时。在本文中,我们提出一种正则化融合方法,该方法使我们能够在回归分析中识别并合并研究间的同质参数簇,而无需使用假设检验方法。使用融合套索,我们建立了一种计算效率高的程序来处理大规模整合数据。在融合套索中纳入估计参数排序可提高计算速度且不会损失统计效能。我们进行了广泛的模拟研究,并提供了一个应用示例来展示新方法与传统方法相比的性能。

相似文献

1
Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.回归系数聚类中的融合套索方法——数据整合中的学习参数异质性
J Mach Learn Res. 2016;17.
2
Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements.在合并具有重复测量的多项研究中采用参数排序适配的融合套索法。
Biometrics. 2016 Dec;72(4):1184-1193. doi: 10.1111/biom.12496. Epub 2016 Feb 22.
3
Fusion Learning Algorithm to Combine Partially Heterogeneous Cox Models.融合学习算法以合并部分异质性Cox模型
Comput Stat. 2019 Mar;34(1):395-414. doi: 10.1007/s00180-018-0827-6. Epub 2018 Jul 17.
4
Generalized fused group lasso regularized multi-task feature learning for predicting cognitive outcomes in Alzheimers disease.用于预测阿尔茨海默病认知结局的广义融合组套索正则化多任务特征学习。
Comput Methods Programs Biomed. 2018 Aug;162:19-45. doi: 10.1016/j.cmpb.2018.04.028. Epub 2018 May 3.
5
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.考虑多个数据源回归系数异质性的高维变量选择
Can J Stat. 2024 Sep;52(3):900-923. doi: 10.1002/cjs.11793. Epub 2023 Aug 19.
6
A Generic Path Algorithm for Regularized Statistical Estimation.一种用于正则化统计估计的通用路径算法。
J Am Stat Assoc. 2014;109(506):686-699. doi: 10.1080/01621459.2013.864166.
7
Fused Lasso Screening Rules via the Monotonicity of Subdifferentials.基于次微分单调性的融合套索筛选规则。
IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1806-20. doi: 10.1109/TPAMI.2014.2388203.
8
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
9
Predicting drug-target interaction network using deep learning model.利用深度学习模型预测药物-靶标相互作用网络。
Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.
10
Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan.使用广义套索进行时空聚类分析,并应用于揭示日本新冠疫情病例的传播情况。
Comput Stat. 2023 Apr 11:1-25. doi: 10.1007/s00180-023-01331-x.

引用本文的文献

1
A partially heterogeneous weighted fusion learning method for potential heterogeneous treatment effect in multi-site survival study.一种用于多中心生存研究中潜在异质性治疗效果的部分异质性加权融合学习方法。
BMC Med Res Methodol. 2025 Jul 1;25(1):169. doi: 10.1186/s12874-025-02612-3.
2
Joint and Individual Component Regression.联合与个体成分回归
J Comput Graph Stat. 2024;33(3):763-773. doi: 10.1080/10618600.2023.2284227. Epub 2023 Dec 29.
3
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.

本文引用的文献

1
Adaptive Estimation with Partially Overlapping Models.具有部分重叠模型的自适应估计
Stat Sin. 2016 Jan;26(1):235-253. doi: 10.5705/ss.2014.233.
2
Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements.在合并具有重复测量的多项研究中采用参数排序适配的融合套索法。
Biometrics. 2016 Dec;72(4):1184-1193. doi: 10.1111/biom.12496. Epub 2016 Feb 22.
3
Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness.仅使用汇总统计量对异质性研究进行多变量Meta分析:效率与稳健性
考虑多个数据源回归系数异质性的高维变量选择
Can J Stat. 2024 Sep;52(3):900-923. doi: 10.1002/cjs.11793. Epub 2023 Aug 19.
4
New clinical trial design borrowing information across patient subgroups based on fusion-penalized regression models.基于融合惩罚回归模型的亚组患者信息借用的新型临床试验设计。
Stat Methods Med Res. 2024 Oct;33(10):1718-1730. doi: 10.1177/09622802241267355. Epub 2024 Aug 19.
5
Regarding: LASSO-derived model for the prediction of lean-non-alcoholic fatty liver disease in examinees attending a routine health check-up.关于:用于预测参加常规健康检查的受检者中瘦型非酒精性脂肪性肝病的套索衍生模型。
Ann Med. 2024 Dec;56(1):2350628. doi: 10.1080/07853890.2024.2350628. Epub 2024 May 10.
6
Multi-task Learning with High-Dimensional Noisy Images.高维噪声图像的多任务学习
J Am Stat Assoc. 2024;119(545):650-663. doi: 10.1080/01621459.2022.2140052. Epub 2022 Nov 17.
7
Determinants and spatial factors of anemia in women of reproductive age in Democratic Republic of Congo (drc): a Bayesian multilevel ordinal logistic regression model approach.刚果民主共和国育龄妇女贫血的决定因素和空间因素:贝叶斯多层次有序逻辑回归模型方法。
BMC Public Health. 2024 Jan 17;24(1):202. doi: 10.1186/s12889-023-17554-y.
8
A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources.一种基于树的模型平均方法,用于从异构数据源估计个性化治疗效果。
Proc Mach Learn Res. 2022 Jul;162:21013-21036.
9
Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model.高维逻辑回归模型中异质子群体的偏差校正推断
Sci Rep. 2023 Dec 11;13(1):21979. doi: 10.1038/s41598-023-48903-x.
10
HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.针对异质子群体的高维因子回归
Stat Sin. 2023 Jan;33(1):27-53. doi: 10.5705/ss.202020.0145.
J Am Stat Assoc. 2015;110(509):326-340. doi: 10.1080/01621459.2014.899235.
4
Homogeneity Pursuit.同质性追求
J Am Stat Assoc. 2015;110(509):175-194. doi: 10.1080/01621459.2014.892882.
5
Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.聚类分析:通过带有非凸惩罚项的监督学习实现无监督学习
J Mach Learn Res. 2013 Jul 1;14(7):1865.
6
Feature Grouping and Selection Over an Undirected Graph.无向图上的特征分组与选择
KDD. 2012:922-930. doi: 10.1145/2339530.2339675.
7
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.
8
Grouping pursuit through a regularization solution surface.通过正则化解曲面进行分组追踪。
J Am Stat Assoc. 2010 Jun 1;105(490):727-739. doi: 10.1198/jasa.2010.tm09380.
9
Capturing heterogeneity in gene expression studies by surrogate variable analysis.通过替代变量分析在基因表达研究中捕捉异质性。
PLoS Genet. 2007 Sep;3(9):1724-35. doi: 10.1371/journal.pgen.0030161. Epub 2007 Aug 1.
10
Recent developments in meta-analysis.荟萃分析的最新进展。
Stat Med. 2008 Feb 28;27(5):625-50. doi: 10.1002/sim.2934.