Suppr超能文献

人类表观基因组的连续染色质状态特征注释。

Continuous chromatin state feature annotation of the human epigenome.

机构信息

School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.

出版信息

Bioinformatics. 2022 May 26;38(11):3029-3036. doi: 10.1093/bioinformatics/btac283.

Abstract

MOTIVATION

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These methods take as input a set of sequencing-based assays of epigenomic activity, such as ChIP-seq measurements of histone modification and transcription factor binding. They output an annotation of the genome that assigns a chromatin state label to each genomic position. Existing SAGA methods have several limitations caused by the discrete annotation framework: such annotations cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, we propose an annotation strategy that instead outputs a vector of chromatin state features at each position rather than a single discrete label. Continuous modeling is common in other fields, such as in topic modeling of text documents. We propose a method, epigenome-ssm-nonneg, that uses a non-negative state space model to efficiently annotate the genome with chromatin state features. We also propose several measures of the quality of a chromatin state feature annotation and we compare the performance of several alternative methods according to these quality measures.

RESULTS

We show that chromatin state features from epigenome-ssm-nonneg are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers. Therefore, we expect that these continuous chromatin state features will be valuable reference annotations to be used in visualization and downstream analysis.

AVAILABILITY AND IMPLEMENTATION

Source code for epigenome-ssm is available at https://github.com/habibdanesh/epigenome-ssm and Zenodo (DOI: 10.5281/zenodo.6507585).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

分割和基因组注释(SAGA)算法被广泛用于理解基因组的活性和基因调控。这些方法以基于测序的一组表观基因组活性测定为输入,例如组蛋白修饰和转录因子结合的 ChIP-seq 测量。它们输出基因组的注释,为每个基因组位置分配染色质状态标签。现有的 SAGA 方法由于离散注释框架存在几个限制:这种注释不能轻易地表示基因组元件的不同强度,也不能轻易地表示同时表现出多种类型活性的组合元件。为了弥补这些限制,我们提出了一种注释策略,该策略不是输出每个位置的单个离散标签,而是输出一组染色质状态特征。连续建模在其他领域很常见,例如在文本文档的主题建模中。我们提出了一种方法 epigenome-ssm-nonneg,它使用非负状态空间模型来有效地用染色质状态特征注释基因组。我们还提出了几种评估染色质状态特征注释质量的度量标准,并根据这些质量度量标准比较了几种替代方法的性能。

结果

我们表明,与连续和离散替代方案相比,来自 epigenome-ssm-nonneg 的染色质状态特征在几个下游应用中更有用,包括识别表达基因和增强子的能力。因此,我们期望这些连续的染色质状态特征将成为可视化和下游分析中有用的参考注释。

可用性和实现

epigenome-ssm 的源代码可在 https://github.com/habibdanesh/epigenome-ssm 和 Zenodo(DOI:10.5281/zenodo.6507585)上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fac1/9154241/19db01142aa9/btac283f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验