Buyukozkan Mustafa, Suhre Karsten, Krumsiek Jan
Department of Physiology and Biophysics, Institute for Computational Biomedicine, New York, NY 10021, USA.
Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA.
Bioinformatics. 2022 Jan 3;38(2):573-576. doi: 10.1093/bioinformatics/btab656.
The 'Subgroup Identification' (SGI) toolbox provides an algorithm to automatically detect clinical subgroups of samples in large-scale omics datasets. It is based on hierarchical clustering trees in combination with a specifically designed association testing and visualization framework that can process an arbitrary number of clinical parameters and outcomes in a systematic fashion. A multi-block extension allows for the simultaneous use of multiple omics datasets on the same samples. In this article, we first describe the functionality of the toolbox and then demonstrate its capabilities through application examples on a type 2 diabetes metabolomics study as well as two copy number variation datasets from The Cancer Genome Atlas.
SGI is an open-source package implemented in R. Package source codes and hands-on tutorials are available at https://github.com/krumsieklab/sgi. The QMdiab metabolomics data is included in the package and can be downloaded from https://doi.org/10.6084/m9.figshare.5904022.
Supplementary data are available at Bioinformatics online.
“亚组识别”(SGI)工具箱提供了一种算法,用于自动检测大规模组学数据集中样本的临床亚组。它基于层次聚类树,并结合了专门设计的关联测试和可视化框架,该框架能够以系统的方式处理任意数量的临床参数和结果。多模块扩展允许在相同样本上同时使用多个组学数据集。在本文中,我们首先描述该工具箱的功能,然后通过在2型糖尿病代谢组学研究以及来自癌症基因组图谱的两个拷贝数变异数据集上的应用示例来展示其能力。
SGI是一个用R语言实现的开源软件包。软件包源代码和实践教程可在https://github.com/krumsieklab/sgi获取。QMdiab代谢组学数据包含在该软件包中,可从https://doi.org/10.6084/m9.figshare.5904022下载。
补充数据可在《生物信息学》在线获取。