Suppr超能文献

贝叶斯聚类问题中用于可视化的凝聚层次方法。

An agglomerative hierarchical approach to visualization in Bayesian clustering problems.

作者信息

Dawson K J, Belkhir K

机构信息

Centre for Mathematical and Computational Biology, Rothamsted Research, Harpenden, Hertfordshire, UK.

出版信息

Heredity (Edinb). 2009 Jul;103(1):32-45. doi: 10.1038/hdy.2009.29. Epub 2009 Apr 1.

Abstract

Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals--the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. As the number of possible partitions grows very rapidly with the sample size, we cannot visualize this probability distribution in its entirety, unless the sample is very small. As a solution to this visualization problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package PartitionView. The exact linkage algorithm takes the posterior co-assignment probabilities as input and yields as output a rooted binary tree, or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities.

摘要

聚类问题(包括将个体聚类为异交群体、杂交世代、全同胞家系和自交系)最近在群体遗传学中受到了广泛关注。在这些聚类问题中,感兴趣的参数是抽样个体集合的一个划分——样本划分。在对这类聚类问题的全贝叶斯方法中,我们关于样本划分的知识由可能样本划分空间上的概率分布来表示。由于可能划分的数量随着样本量的增加而迅速增长,除非样本非常小,否则我们无法完整地可视化这个概率分布。作为解决这个可视化问题的方法,我们建议使用一种凝聚层次聚类算法,我们称之为精确连锁算法。该算法是我们之前介绍的最大最小聚类算法的一个特例。精确连锁算法现在已在我们的软件包PartitionView中实现。精确连锁算法将后验共分配概率作为输入,并输出一棵有根二叉树,或者更一般地,一组这样的树组成的森林。这个森林的每个节点定义一组个体,节点高度是该组的后验共分配概率。这为与个体分类相关的不确定性提供了一种有用的可视化表示。它也是从共分配概率方面更详细探索后验分布的一个有用起点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/2705916/b272bc6adcbd/ukmss-3290-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验