Suppr超能文献

β分布区间数据潜在变量模型中的推断与学习

Inference and Learning in a Latent Variable Model for Beta Distributed Interval Data.

作者信息

Mousavi Hamid, Buhl Mareike, Guiraud Enrico, Drefs Jakob, Lücke Jörg

机构信息

Machine Learning Lab, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, 26129 Oldenburg, Germany.

Medical Physics Group, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, 26129 Oldenburg, Germany.

出版信息

Entropy (Basel). 2021 Apr 29;23(5):552. doi: 10.3390/e23050552.

Abstract

Latent Variable Models (LVMs) are well established tools to accomplish a range of different data processing tasks. Applications exploit the ability of LVMs to identify latent data structure in order to improve data (e.g., through denoising) or to estimate the relation between latent causes and measurements in medical data. In the latter case, LVMs in the form of noisy-OR Bayes nets represent the standard approach to relate binary latents (which represent diseases) to binary observables (which represent symptoms). Bayes nets with binary representation for symptoms may be perceived as a coarse approximation, however. In practice, real disease symptoms can range from absent over mild and intermediate to very severe. Therefore, using diseases/symptoms relations as motivation, we here ask how standard noisy-OR Bayes nets can be generalized to incorporate continuous observables, e.g., variables that model symptom severity in an interval from healthy to pathological. This transition from binary to interval data poses a number of challenges including a transition from a Bernoulli to a Beta distribution to model symptom statistics. While noisy-OR-like approaches are constrained to model how causes determine the observables' mean values, the use of Beta distributions additionally provides (and also requires) that the causes determine the observables' variances. To meet the challenges emerging when generalizing from Bernoulli to Beta distributed observables, we investigate a novel LVM that uses a maximum non-linearity to model how the latents determine means and variances of the observables. Given the model and the goal of likelihood maximization, we then leverage recent theoretical results to derive an Expectation Maximization (EM) algorithm for the suggested LVM. We further show how variational EM can be used to efficiently scale the approach to large networks. Experimental results finally illustrate the efficacy of the proposed model using both synthetic and real data sets. Importantly, we show that the model produces reliable results in estimating causes using proofs of concepts and first tests based on real medical data and on images.

摘要

潜在变量模型(LVMs)是用于完成一系列不同数据处理任务的成熟工具。应用程序利用LVMs识别潜在数据结构的能力,以改进数据(例如,通过去噪),或估计医学数据中潜在原因与测量值之间的关系。在后一种情况下,噪声或贝叶斯网络形式的LVMs代表了将二元潜在变量(代表疾病)与二元可观测变量(代表症状)联系起来的标准方法。然而,症状采用二元表示的贝叶斯网络可能被视为一种粗略的近似。在实际中,实际疾病症状的范围可以从无到轻度、中度再到非常严重。因此,以疾病/症状关系为动机,我们在此探讨如何将标准的噪声或贝叶斯网络进行推广,以纳入连续可观测变量,例如,在从健康到患病的区间内对症状严重程度进行建模的变量。从二元数据到区间数据的这种转变带来了许多挑战,包括从伯努利分布到贝塔分布的转变以对症状统计进行建模。虽然类似噪声或的方法被限制用于对原因如何确定可观测变量的均值进行建模,但使用贝塔分布还额外提供(并且也要求)原因确定可观测变量的方差。为了应对从伯努利分布可观测变量推广到贝塔分布可观测变量时出现的挑战,我们研究了一种新颖的LVM,它使用最大非线性来对潜在变量如何确定可观测变量的均值和方差进行建模。给定该模型以及似然最大化的目标,我们随后利用最近的理论结果为所建议的LVM推导一种期望最大化(EM)算法。我们还进一步展示了如何使用变分EM来有效地将该方法扩展到大型网络。实验结果最终使用合成数据集和真实数据集说明了所提出模型的有效性。重要的是,我们表明该模型在基于真实医学数据和图像的概念验证及首次测试中,在估计原因方面产生了可靠的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验