Suppr超能文献

准确预测顺式调控模块的功能状态揭示了人类和小鼠中常见的表观遗传规律。

Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice.

机构信息

Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.

出版信息

BMC Biol. 2022 Oct 5;20(1):221. doi: 10.1186/s12915-022-01426-9.

Abstract

BACKGROUND

Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step.

RESULTS

We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1~4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type.

CONCLUSIONS

Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1~4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.

摘要

背景

预测基因组中的顺式调控模块 (CRM) 及其在生物体各种细胞/组织类型中的功能状态是两个相关的具有挑战性的计算任务。目前大多数方法试图使用细胞/组织类型中的多种表观遗传标记的数据同时实现这两个目标。尽管概念上很有吸引力,但它们存在高假阳性率和应用有限的问题。为了填补这些空白,我们提出了一种两步策略,首先预测基因组中的 CRM 图谱,然后预测生物体各种细胞/组织类型中所有 CRM 的功能状态。我们最近开发了一种算法,用于第一步,该算法通过整合生物体中大量转录因子 ChIP-seq 数据集,能够比现有方法更准确和完整地预测基因组中的 CRM。在这里,我们提出了用于第二步的机器学习方法。

结果

我们表明,仅使用 1 到 4 种表观遗传标记的数据,通过各种机器学习分类器,就可以准确预测基因组中所有 CRM 在细胞/组织类型中的功能状态。我们的预测比迄今为止取得的最佳结果更为准确。有趣的是,在人类细胞/组织类型上训练的模型可以准确预测人类和小鼠不同细胞/组织类型以及反之 CRM 的功能状态。因此,定义各种细胞/组织类型中 CRM 功能状态的表观遗传密码至少在人类和小鼠中是通用的。此外,我们发现,在人类和小鼠的细胞/组织类型中,有数十到数十万的 CRM 处于活跃状态,其中多达 99.98%在不同的细胞/组织类型中被重新利用,而只有 0.02%是特定于细胞/组织类型的,这些可能定义了细胞/组织类型。

结论

我们的两步方法仅使用 1 到 4 种表观遗传标记的数据,就可以准确预测基因组中所有 CRM 在任何细胞/组织类型中的功能状态。与通常使用更多表观遗传标记数据的现有方法相比,我们的方法更具成本效益。我们的结果表明,在人类和小鼠的各种细胞/组织类型中,定义 CRM 功能状态存在共同的表观遗传规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bc6/9535988/2af6b4000af7/12915_2022_1426_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验