• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CLUSTERnGO:一个用于时间序列数据两阶段聚类的用户定义建模平台。

CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

作者信息

Fidaner Işık Barış, Cankorur-Cetinkaya Ayca, Dikicioglu Duygu, Kirdar Betul, Cemgil Ali Taylan, Oliver Stephen G

机构信息

Department of Computer Engineering.

Department of Chemical Engineering, Bogazici University, Istanbul, Turkey and Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, UK.

出版信息

Bioinformatics. 2016 Feb 1;32(3):388-97. doi: 10.1093/bioinformatics/btv532. Epub 2015 Sep 26.

DOI:10.1093/bioinformatics/btv532
PMID:26411869
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4734040/
Abstract

MOTIVATION

Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets.

RESULTS

We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications.

AVAILABILITY AND IMPLEMENTATION

The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG.

CONTACT

sgo24@cam.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

简单的生物信息学工具经常被用于分析时间序列数据集,而不管它们处理瞬态现象的能力如何,这限制了可以从这些数据集中提取的有意义的信息。这种情况需要开发和利用专门为时间序列数据集分析设计的定制、易用且灵活的工具。

结果

我们提出了一种名为CLUSTERnGO的新型统计应用程序,它使用基于模型的聚类算法来满足这一需求。该算法涉及两个操作组件。组件1构建一个贝叶斯非参数模型(分段线性序列的无限混合),组件2应用一种新颖的聚类方法(两阶段聚类)。该软件还可以使用适当的本体为识别出的聚类赋予生物学意义。它应用多重假设检验来报告这些富集的显著性。该算法有一个四阶段流程。该应用程序可以使用命令行工具或用户友好的图形用户界面来执行。后者的开发是为了满足专业和非专业用户的需求。我们使用三个不同的测试案例来证明所提出策略的灵活性。在所有情况下,CLUSTERnGO不仅在为识别出的聚类分配独特的基因本体(GO)术语富集方面优于现有算法,而且还揭示了有关所研究生物系统的新见解,这些见解在原始出版物中并未被发现。

可用性和实现

C++和QT源代码、适用于Windows、OS X和Linux操作系统的GUI应用程序以及用户手册可在GNU GPL v3许可下免费下载,网址为http://www.cmpe.boun.edu.tr/content/CnG。

联系方式

sgo24@cam.ac.uk

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/6de08606330b/btv532f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/986721da3d96/btv532f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/c7ad1d5ac49d/btv532f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/6de08606330b/btv532f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/986721da3d96/btv532f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/c7ad1d5ac49d/btv532f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d5/4734040/6de08606330b/btv532f3p.jpg

相似文献

1
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.CLUSTERnGO:一个用于时间序列数据两阶段聚类的用户定义建模平台。
Bioinformatics. 2016 Feb 1;32(3):388-97. doi: 10.1093/bioinformatics/btv532. Epub 2015 Sep 26.
2
Clustering short time series gene expression data.聚类短时间序列基因表达数据。
Bioinformatics. 2005 Jun;21 Suppl 1:i159-68. doi: 10.1093/bioinformatics/bti1022.
3
A novel approach for clustering proteomics data using Bayesian fast Fourier transform.一种使用贝叶斯快速傅里叶变换对蛋白质组学数据进行聚类的新方法。
Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.
4
RNASeqGUI: a GUI for analysing RNA-Seq data.RNASeqGUI:一款用于分析RNA测序数据的图形用户界面工具。
Bioinformatics. 2014 Sep 1;30(17):2514-6. doi: 10.1093/bioinformatics/btu308. Epub 2014 May 7.
5
affylmGUI: a graphical user interface for linear modeling of single channel microarray data.AffylmGUI:用于单通道微阵列数据线性建模的图形用户界面。
Bioinformatics. 2006 Apr 1;22(7):897-9. doi: 10.1093/bioinformatics/btl025. Epub 2006 Feb 2.
6
Open source clustering software.开源聚类软件。
Bioinformatics. 2004 Jun 12;20(9):1453-4. doi: 10.1093/bioinformatics/bth078. Epub 2004 Feb 10.
7
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.DIMM-SC:一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。
Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.
8
KABOOM! A new suffix array based algorithm for clustering expression data.砰!一种新的基于后缀数组的聚类表达数据算法。
Bioinformatics. 2011 Dec 15;27(24):3348-55. doi: 10.1093/bioinformatics/btr560. Epub 2011 Oct 8.
9
An introspective comparison of random forest-based classifiers for the analysis of cluster-correlated data by way of RF++.基于随机森林的分类器在通过 RF++分析聚类相关数据方面的回顾性比较。
PLoS One. 2009 Sep 18;4(9):e7087. doi: 10.1371/journal.pone.0007087.
10
TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.TimesVector:一种用于分析来自多种表型的时间序列转录组数据的向量化聚类方法。
Bioinformatics. 2017 Dec 1;33(23):3827-3835. doi: 10.1093/bioinformatics/btw780.

引用本文的文献

1
A heuristic approach to handling missing data in biologics manufacturing databases.一种处理生物制品制造数据库中缺失数据的启发式方法。
Bioprocess Biosyst Eng. 2019 Apr;42(4):657-663. doi: 10.1007/s00449-018-02059-5. Epub 2019 Jan 8.

本文引用的文献

1
In-vivo quantitative proteomics reveals a key contribution of post-transcriptional mechanisms to the circadian regulation of liver metabolism.体内定量蛋白质组学揭示了转录后机制对肝脏代谢昼夜节律调节的关键贡献。
PLoS Genet. 2014 Jan;10(1):e1004047. doi: 10.1371/journal.pgen.1004047. Epub 2014 Jan 2.
2
Circadian clock-controlled diurnal oscillation of Ras/ERK signaling in mouse liver.生物钟控制的小鼠肝脏 Ras/ERK 信号的昼夜振荡。
Proc Jpn Acad Ser B Phys Biol Sci. 2013;89(1):59-65. doi: 10.2183/pjab.89.59.
3
Time course gene expression profiling of yeast spore germination reveals a network of transcription factors orchestrating the global response.
酵母孢子萌发过程中的时程基因表达谱分析揭示了一个转录因子网络,协调全局反应。
BMC Genomics. 2012 Oct 15;13:554. doi: 10.1186/1471-2164-13-554.
4
How yeast re-programmes its transcriptional profile in response to different nutrient impulses.酵母如何响应不同的营养刺激来重新编程其转录谱。
BMC Syst Biol. 2011 Sep 25;5:148. doi: 10.1186/1752-0509-5-148.
5
A cluster separation measure.一种聚类分离度量。
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.
6
REVIGO summarizes and visualizes long lists of gene ontology terms.REVIGO 对基因本体论术语的长列表进行总结和可视化。
PLoS One. 2011;6(7):e21800. doi: 10.1371/journal.pone.0021800. Epub 2011 Jul 18.
7
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。
Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.
8
Extracting binary signals from microarray time-course data.从微阵列时间序列数据中提取二元信号。
Nucleic Acids Res. 2007;35(11):3705-12. doi: 10.1093/nar/gkm284. Epub 2007 May 21.
9
Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类
Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.
10
Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。
BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.