Fidaner Işık Barış, Cankorur-Cetinkaya Ayca, Dikicioglu Duygu, Kirdar Betul, Cemgil Ali Taylan, Oliver Stephen G
Department of Computer Engineering.
Department of Chemical Engineering, Bogazici University, Istanbul, Turkey and Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, UK.
Bioinformatics. 2016 Feb 1;32(3):388-97. doi: 10.1093/bioinformatics/btv532. Epub 2015 Sep 26.
Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets.
We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications.
The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG.
Supplementary data are available at Bioinformatics online.
简单的生物信息学工具经常被用于分析时间序列数据集,而不管它们处理瞬态现象的能力如何,这限制了可以从这些数据集中提取的有意义的信息。这种情况需要开发和利用专门为时间序列数据集分析设计的定制、易用且灵活的工具。
我们提出了一种名为CLUSTERnGO的新型统计应用程序,它使用基于模型的聚类算法来满足这一需求。该算法涉及两个操作组件。组件1构建一个贝叶斯非参数模型(分段线性序列的无限混合),组件2应用一种新颖的聚类方法(两阶段聚类)。该软件还可以使用适当的本体为识别出的聚类赋予生物学意义。它应用多重假设检验来报告这些富集的显著性。该算法有一个四阶段流程。该应用程序可以使用命令行工具或用户友好的图形用户界面来执行。后者的开发是为了满足专业和非专业用户的需求。我们使用三个不同的测试案例来证明所提出策略的灵活性。在所有情况下,CLUSTERnGO不仅在为识别出的聚类分配独特的基因本体(GO)术语富集方面优于现有算法,而且还揭示了有关所研究生物系统的新见解,这些见解在原始出版物中并未被发现。
C++和QT源代码、适用于Windows、OS X和Linux操作系统的GUI应用程序以及用户手册可在GNU GPL v3许可下免费下载,网址为http://www.cmpe.boun.edu.tr/content/CnG。
补充数据可在《生物信息学》在线获取。