回归系数聚类中的融合套索方法——数据整合中的学习参数异质性

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.

作者信息

Tang Lu, Song Peter X K

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

J Mach Learn Res. 2016;17.

PMID:29056876

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5647925/

Abstract

As data sets of related studies become more easily accessible, combining data sets of similar studies is often undertaken in practice to achieve a larger sample size and higher power. A major challenge arising from data integration pertains to data heterogeneity in terms of study population, study design, or study coordination. Ignoring such heterogeneity in data analysis may result in biased estimation and misleading inference. Traditional techniques of remedy to data heterogeneity include the use of interactions and random effects, which are inferior to achieving desirable statistical power or providing a meaningful interpretation, especially when a large number of smaller data sets are combined. In this paper, we propose a regularized fusion method that allows us to identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach. Using the fused lasso, we establish a computationally efficient procedure to deal with large-scale integrated data. Incorporating the estimated parameter ordering in the fused lasso facilitates computing speed with no loss of statistical power. We conduct extensive simulation studies and provide an application example to demonstrate the performance of the new method with a comparison to the conventional methods.

摘要

随着相关研究数据集变得更容易获取，在实践中常常会合并相似研究的数据集以获得更大的样本量和更高的检验效能。数据整合带来的一个主要挑战涉及到研究人群、研究设计或研究协调方面的数据异质性。在数据分析中忽略这种异质性可能会导致有偏差的估计和误导性的推断。传统的数据异质性补救技术包括使用交互作用和随机效应，但这些方法在实现理想的统计效能或提供有意义的解释方面效果欠佳，尤其是在合并大量较小的数据集时。在本文中，我们提出一种正则化融合方法，该方法使我们能够在回归分析中识别并合并研究间的同质参数簇，而无需使用假设检验方法。使用融合套索，我们建立了一种计算效率高的程序来处理大规模整合数据。在融合套索中纳入估计参数排序可提高计算速度且不会损失统计效能。我们进行了广泛的模拟研究，并提供了一个应用示例来展示新方法与传统方法相比的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

回归系数聚类中的融合套索方法——数据整合中的学习参数异质性

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

回归系数聚类中的融合套索方法——数据整合中的学习参数异质性

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献