Suppr超能文献

混合密度网络用于参考区间的间接估计。

Mixture density networks for the indirect estimation of reference intervals.

机构信息

Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 6, 91054, Erlangen, Germany.

Chair of Spatial Data Science and Statistical Learning, Georg-August-Universität Göttingen, Platz der Göttinger Sieben 3, 37073, Göttingen, Germany.

出版信息

BMC Bioinformatics. 2022 Jul 29;23(1):307. doi: 10.1186/s12859-022-04846-0.

Abstract

BACKGROUND

Reference intervals represent the expected range of physiological test results in a healthy population and are essential to support medical decision making. Particularly in the context of pediatric reference intervals, where recruitment regulations make prospective studies challenging to conduct, indirect estimation strategies are becoming increasingly important. Established indirect methods enable robust identification of the distribution of "healthy" samples from laboratory databases, which include unlabeled pathologic cases, but are currently severely limited when adjusting for essential patient characteristics such as age. Here, we propose the use of mixture density networks (MDN) to overcome this problem and model all parameters of the mixture distribution in a single step.

RESULTS

Estimated reference intervals from varying settings with simulated data demonstrate the ability to accurately estimate latent distributions from unlabeled data using different implementations of MDNs. Comparing the performance with alternative estimation approaches further highlights the importance of modeling the mixture component weights as a function of the input in order to avoid biased estimates for all other parameters and the resulting reference intervals. We also provide a strategy to generate partially customized starting weights to improve proper identification of the latent components. Finally, the application on real-world hemoglobin samples provides results in line with current gold standard approaches, but also suggests further investigations with respect to adequate regularization strategies in order to prevent overfitting the data.

CONCLUSIONS

Mixture density networks provide a promising approach capable of extracting the distribution of healthy samples from unlabeled laboratory databases while simultaneously and explicitly estimating all parameters and component weights as non-linear functions of the covariate(s), thereby allowing the estimation of age-dependent reference intervals in a single step. Further studies on model regularization and asymmetric component distributions are warranted to consolidate our findings and expand the scope of applications.

摘要

背景

参考区间表示健康人群中生理测试结果的预期范围,对于支持医学决策至关重要。特别是在儿科参考区间的背景下,由于招募规定使得前瞻性研究难以进行,间接估计策略变得越来越重要。已建立的间接方法能够从包括未标记的病理病例的实验室数据库中可靠地识别“健康”样本的分布,但在调整年龄等重要患者特征时,目前受到严重限制。在这里,我们提出使用混合密度网络(MDN)来克服这个问题,并在单个步骤中对混合分布的所有参数进行建模。

结果

使用模拟数据在不同设置下估计的参考区间表明,使用 MDN 的不同实现从未标记的数据中准确估计潜在分布的能力。与替代估计方法的性能比较进一步强调了将混合分量权重建模为输入函数的重要性,以避免对所有其他参数和由此产生的参考区间产生有偏差的估计。我们还提供了一种生成部分定制起始权重的策略,以改善对潜在成分的正确识别。最后,在真实世界的血红蛋白样本上的应用结果与当前的黄金标准方法一致,但也表明需要进一步研究适当的正则化策略,以防止数据过度拟合。

结论

混合密度网络提供了一种很有前途的方法,能够从未标记的实验室数据库中提取健康样本的分布,同时还可以将所有参数和分量权重作为协变量的非线性函数进行显式估计,从而可以在单个步骤中估计依赖年龄的参考区间。需要进一步研究模型正则化和非对称分量分布,以巩固我们的发现并扩大应用范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39f9/9336034/b160599f9ff9/12859_2022_4846_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验