Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin.
BMC Bioinformatics. 2010 Jan 6;11:9. doi: 10.1186/1471-2105-11-9.
Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.
PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.
PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.
聚类分析是一种用于探索性分析生物数据的重要技术。这种数据通常是高维的、固有噪声的并且包含异常值。这使得聚类具有挑战性。混合模型是一种通用且强大的统计模型,它在存在噪声的情况下对聚类具有稳健性,并且已成功应用于广泛的应用中。
PyMix - Python 混合包实现了用于聚类的基本和高级混合模型的算法和数据结构。高级模型包括特定于上下文的独立性混合、依赖树的混合和半监督学习。PyMix 是根据 GNU 通用公共许可证(GPL)许可的。PyMix 已成功用于生物序列、复杂疾病和基因表达数据的分析。
PyMix 是一种用于生物数据聚类分析的有用工具。由于框架的通用性,PyMix 可以应用于广泛的应用和数据集。