Rahnavard Ali, Chatterjee Suvo, Sayoldin Bahar, Crandall Keith A, Tekola-Ayele Fasil, Mallick Himel
Department of Biostatistics and Bioinformatics, Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA.
Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA.
Bioinformatics. 2021 Oct 25;37(20):3588-3594. doi: 10.1093/bioinformatics/btab317.
The discovery of biologically interpretable and clinically actionable communities in heterogeneous omics data is a necessary first step toward deriving mechanistic insights into complex biological phenomena. Here, we present a novel clustering approach, omeClust, for community detection in omics profiles by simultaneously incorporating similarities among measurements and the overall complex structure of the data.
We show that omeClust outperforms published methods in inferring the true community structure as measured by both sensitivity and misclassification rate on simulated datasets. We further validated omeClust in diverse, multiple omics datasets, revealing new communities and functionally related groups in microbial strains, cell line gene expression patterns and fetal genomic variation. We also derived enrichment scores attributable to putatively meaningful biological factors in these datasets that can serve as hypothesis generators facilitating new sets of testable hypotheses.
omeClust is open-source software, and the implementation is available online at http://github.com/omicsEye/omeClust.
Supplementary data are available at Bioinformatics online.
在异质组学数据中发现具有生物学可解释性和临床可操作性的群落,是深入了解复杂生物学现象机制的必要第一步。在此,我们提出了一种新颖的聚类方法omeClust,用于通过同时纳入测量值之间的相似性和数据的整体复杂结构来检测组学概况中的群落。
我们表明,在模拟数据集上,以灵敏度和错误分类率衡量,omeClust在推断真实群落结构方面优于已发表的方法。我们在多样的多个组学数据集中进一步验证了omeClust,揭示了微生物菌株、细胞系基因表达模式和胎儿基因组变异中的新群落和功能相关组。我们还在这些数据集中得出了可归因于假定有意义的生物学因素的富集分数,这些分数可作为假设生成器,促进产生新的可检验假设集。
omeClust是开源软件,其实现可在http://github.com/omicsEye/omeClust在线获取。
补充数据可在《生物信息学》在线获取。