Suppr超能文献

PyMix--python 混合包--一种用于异构生物数据聚类的工具。

PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

机构信息

Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin.

出版信息

BMC Bioinformatics. 2010 Jan 6;11:9. doi: 10.1186/1471-2105-11-9.

Abstract

BACKGROUND

Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

RESULTS

PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

CONCLUSIONS

PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

摘要

背景

聚类分析是一种用于探索性分析生物数据的重要技术。这种数据通常是高维的、固有噪声的并且包含异常值。这使得聚类具有挑战性。混合模型是一种通用且强大的统计模型,它在存在噪声的情况下对聚类具有稳健性,并且已成功应用于广泛的应用中。

结果

PyMix - Python 混合包实现了用于聚类的基本和高级混合模型的算法和数据结构。高级模型包括特定于上下文的独立性混合、依赖树的混合和半监督学习。PyMix 是根据 GNU 通用公共许可证(GPL)许可的。PyMix 已成功用于生物序列、复杂疾病和基因表达数据的分析。

结论

PyMix 是一种用于生物数据聚类分析的有用工具。由于框架的通用性,PyMix 可以应用于广泛的应用和数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc98/2823712/52b33f1d8375/1471-2105-11-9-1.jpg

相似文献

1
PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.
BMC Bioinformatics. 2010 Jan 6;11:9. doi: 10.1186/1471-2105-11-9.
2
Biologically supervised hierarchical clustering algorithms for gene expression data.
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5515-8. doi: 10.1109/IEMBS.2006.260308.
3
bioNMF: a versatile tool for non-negative matrix factorization in biology.
BMC Bioinformatics. 2006 Jul 28;7:366. doi: 10.1186/1471-2105-7-366.
4
Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):615-28. doi: 10.1109/TCBB.2007.70269.
6
Clustering microarray gene expression data using weighted Chinese restaurant process.
Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.
7
cluML: A markup language for clustering and cluster validity assessment of microarray data.
Appl Bioinformatics. 2005;4(3):211-3. doi: 10.2165/00822942-200504030-00006.
8
Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.
9
Merged consensus clustering to assess and improve class discovery with microarray data.
BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.
10
Convex clustering: an attractive alternative to hierarchical clustering.
PLoS Comput Biol. 2015 May 12;11(5):e1004228. doi: 10.1371/journal.pcbi.1004228. eCollection 2015 May.

引用本文的文献

1
Within-patient phylogenetic reconstruction reveals early events in Barrett's Esophagus.
Evol Appl. 2020 Sep 20;14(2):399-415. doi: 10.1111/eva.13125. eCollection 2021 Feb.
2
Toward a statistical description of methane emissions from arctic wetlands.
Ambio. 2017 Feb;46(Suppl 1):70-80. doi: 10.1007/s13280-016-0893-3.
3
Expression-based segmentation of the Drosophila genome.
BMC Genomics. 2013 Nov 20;14:812. doi: 10.1186/1471-2164-14-812.
5
Length control of the injectisome needle requires only one molecule of Yop secretion protein P (YscP).
Proc Natl Acad Sci U S A. 2010 Aug 3;107(31):13860-5. doi: 10.1073/pnas.1006985107. Epub 2010 Jul 19.

本文引用的文献

2
Inferring differentiation pathways from gene expression.
Bioinformatics. 2008 Jul 1;24(13):i156-64. doi: 10.1093/bioinformatics/btn153.
3
Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.
BMC Bioinformatics. 2007;8 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-8-S10-S3.
4
Gene expression trees in lymphoid development.
BMC Immunol. 2007 Oct 9;8:25. doi: 10.1186/1471-2172-8-25.
5
Analyzing gene expression time-courses.
IEEE/ACM Trans Comput Biol Bioinform. 2005 Jul-Sep;2(3):179-93. doi: 10.1109/TCBB.2005.31.
6
Context-specific independence mixture modeling for positional weight matrices.
Bioinformatics. 2006 Jul 15;22(14):e166-73. doi: 10.1093/bioinformatics/btl249.
7
The Graphical Query Language: a tool for analysis of gene expression time-courses.
Bioinformatics. 2005 May 15;21(10):2544-5. doi: 10.1093/bioinformatics/bti311. Epub 2005 Feb 8.
8
JASPAR: an open-access database for eukaryotic transcription factor binding profiles.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4. doi: 10.1093/nar/gkh012.
9
Context-specific Bayesian clustering for gene expression data.
J Comput Biol. 2002;9(2):169-91. doi: 10.1089/10665270252935403.
10
Diagnosis of multiple cancer types by shrunken centroids of gene expression.
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6567-72. doi: 10.1073/pnas.082099299.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验