Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France.
BMC Res Notes. 2022 Jul 7;15(1):241. doi: 10.1186/s13104-022-06129-6.
Data clustering is a common exploration step in the omics era, notably in genomics and proteomics where many genes or proteins can be quantified from one or more experiments. Bayesian clustering is a powerful unsupervised algorithm that can classify several thousands of genes or proteins. AutoClass C, its original implementation, handles missing data, automatically determines the best number of clusters but is not user-friendly.
We developed an online tool called AutoClassWeb, which provides an easy-to-use and simple web interface for Bayesian clustering with AutoClass. Input data are entered as TSV files and quality controlled. Results are provided in formats that ease further analyses with spreadsheet programs or with programming languages, such as Python or R. AutoClassWeb is implemented in Python and is published under the 3-Clauses BSD license. The source code is available at https://github.com/pierrepo/autoclassweb along with a detailed documentation.
数据聚类是组学时代(尤其是在基因组学和蛋白质组学中)常用的探索步骤,在这些领域中,可以从一个或多个实验中定量测量许多基因或蛋白质。贝叶斯聚类是一种强大的无监督算法,可以对数千个基因或蛋白质进行分类。其原始实现 AutoClass C 可以处理缺失数据,自动确定最佳聚类数,但用户友好性较差。
我们开发了一个名为 AutoClassWeb 的在线工具,它为 AutoClass 的贝叶斯聚类提供了一个易于使用和简单的 Web 界面。输入数据以 TSV 文件形式输入,并进行质量控制。结果以易于使用电子表格程序或编程语言(如 Python 或 R)进一步分析的格式提供。AutoClassWeb 是用 Python 实现的,并根据 3 条款 BSD 许可证发布。源代码可在 https://github.com/pierrepo/autoclassweb 上获得,同时还提供了详细的文档。