Suppr超能文献

在大型配对多组学数据集中发现高灵敏度模式。

High-sensitivity pattern discovery in large, paired multiomic datasets.

机构信息

Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.

Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i378-i385. doi: 10.1093/bioinformatics/btac232.

Abstract

MOTIVATION

Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control.

RESULTS

Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes.

AVAILABILITY AND IMPLEMENTATION

An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代生物学筛选产生了大量的测量结果,识别和解释特征之间具有统计学意义的关联是至关重要的。在具有从同一组样本中收集的多个高维数据集的实验中,以提供高统计功效和错误发现率 (FDR) 控制的方式在数据集之间识别相关特征组是很有用的。

结果

在这里,我们提出了一种新颖的层次框架 HAllA(分层全对全关联测试),用于配对高维数据集之间的结构化关联发现。HAllA 有效地将层次假设检验与 FDR 校正相结合,以揭示连续和/或分类数据之间的显著线性和非线性块状关系。我们使用具有已知关联结构的异构合成数据集对 HAllA 进行了优化和评估,HAllA 在一系列常见的相似性度量中优于全对全和其他块状测试方法。然后,我们将 HAllA 应用于一系列真实的多组学数据集,揭示了基因表达与宿主免疫活性、微生物组与宿主转录组、代谢组学分析与人类健康表型之间的新关联。

可用性和实现

HAllA 的开源实现可在 http://huttenhower.sph.harvard.edu/halla 上免费获得,包括文档、演示数据集和用户组。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/9235493/329d15be4984/btac232f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验