Suppr超能文献

具有缺失数据的多尺度亲和力:估计与应用。

Multi-scale affinities with missing data: Estimation and applications.

作者信息

Zhang Min, Mishne Gal, Chi Eric C

机构信息

Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA.

Halcıoğlu Data Science Institute, University of California, San Diego, California, USA.

出版信息

Stat Anal Data Min. 2022 Jun;15(3):303-313. doi: 10.1002/sam.11561. Epub 2021 Nov 5.

Abstract

Many machine learning algorithms depend on weights that quantify row and column similarities of a data matrix. The choice of weights can dramatically impact the effectiveness of the algorithm. Nonetheless, the problem of choosing weights has arguably not been given enough study. When a data matrix is completely observed, Gaussian kernel affinities can be used to quantify the local similarity between pairs of rows and pairs of columns. Computing weights in the presence of missing data, however, becomes challenging. In this paper, we propose a new method to construct row and column affinities even when data are missing by building off a co-clustering technique. This method takes advantage of solving the optimization problem for multiple pairs of cost parameters and filling in the missing values with increasingly smooth estimates. It exploits the coupled similarity structure among both the rows and columns of a data matrix. We show these affinities can be used to perform tasks such as data imputation, clustering, and matrix completion on graphs.

摘要

许多机器学习算法依赖于量化数据矩阵行和列相似度的权重。权重的选择会极大地影响算法的有效性。然而,权重选择问题的研究可能还不够充分。当数据矩阵被完全观测到时,高斯核亲和度可用于量化行对和列对之间的局部相似度。然而,在存在缺失数据的情况下计算权重变得具有挑战性。在本文中,我们提出了一种新方法,即使数据缺失,也能通过基于共聚类技术构建行亲和度和列亲和度。该方法利用为多对成本参数求解优化问题,并使用越来越平滑的估计值填充缺失值。它利用了数据矩阵行和列之间的耦合相似度结构。我们表明,这些亲和度可用于在图上执行数据插补、聚类和矩阵补全等任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc6d/9216212/03db318fe25f/nihms-1751214-f0001.jpg

相似文献

2
Multiple Kernel k-Means with Incomplete Kernels.具有不完整核的多核k均值算法
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1191-1204. doi: 10.1109/TPAMI.2019.2892416. Epub 2019 Jan 14.
7
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
8
Multiple Matrix Gaussian Graphs Estimation.多元矩阵高斯图估计
J R Stat Soc Series B Stat Methodol. 2018 Nov;80(5):927-950. doi: 10.1111/rssb.12278. Epub 2018 Jun 14.
9
Missing value estimation methods for DNA microarrays.DNA微阵列的缺失值估计方法。
Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520.
10
Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类
Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

本文引用的文献

3
Clustering with t-SNE, provably.使用t-SNE进行聚类,可证明。
SIAM J Math Data Sci. 2019;1(2):313-332. doi: 10.1137/18m1216134. Epub 2019 May 28.
4
Optimal clustering with missing values.最优聚类处理缺失值。
BMC Bioinformatics. 2019 Jun 20;20(Suppl 12):321. doi: 10.1186/s12859-019-2832-3.
5
Data-Driven Tree Transforms and Metrics.数据驱动的树变换与度量
IEEE Trans Signal Inf Process Netw. 2018 Sep;4(3):451-466. doi: 10.1109/TSIPN.2017.2743561. Epub 2017 Aug 23.
6
Convex biclustering.凸双聚类
Biometrics. 2017 Mar;73(1):10-19. doi: 10.1111/biom.12540. Epub 2016 May 10.
7
9
Image processing using smooth ordering of its patches.使用平滑排序的补丁进行图像处理。
IEEE Trans Image Process. 2013 Jul;22(7):2764-74. doi: 10.1109/TIP.2013.2257813. Epub 2013 Apr 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验