• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在多种生物条件下识别、语义注释和比较功能元素组合。

Identification, semantic annotation and comparison of combinations of functional elements in multiple biological conditions.

机构信息

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy.

Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy.

出版信息

Bioinformatics. 2022 Feb 7;38(5):1183-1190. doi: 10.1093/bioinformatics/btab815.

DOI:10.1093/bioinformatics/btab815
PMID:34864898
Abstract

MOTIVATION

Approaches such as chromatin immunoprecipitation followed by sequencing (ChIP-seq) represent the standard for the identification of binding sites of DNA-associated proteins, including transcription factors and histone marks. Public repositories of omics data contain a huge number of experimental ChIP-seq data, but their reuse and integrative analysis across multiple conditions remain a daunting task.

RESULTS

We present the Combinatorial and Semantic Analysis of Functional Elements (CombSAFE), an efficient computational method able to integrate and take advantage of the valuable and numerous, but heterogeneous, ChIP-seq data publicly available in big data repositories. Leveraging natural language processing techniques, it integrates omics data samples with semantic annotations from selected biomedical ontologies; then, using hidden Markov models, it identifies combinations of static and dynamic functional elements throughout the genome for the corresponding samples. CombSAFE allows analyzing the whole genome, by clustering patterns of regions with similar functional elements and through enrichment analyses to discover ontological terms significantly associated with them. Moreover, it allows comparing functional states of a specific genomic region to analyze their different behavior throughout the various semantic annotations. Such findings can provide novel insights by identifying unexpected combinations of functional elements in different biological conditions.

AVAILABILITY AND IMPLEMENTATION

The Python implementation of the CombSAFE pipeline is freely available for non-commercial use at: https://github.com/DEIB-GECO/CombSAFE.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

诸如染色质免疫沉淀 followed by sequencing (ChIP-seq) 等方法是鉴定 DNA 相关蛋白(包括转录因子和组蛋白标记物)结合位点的标准方法。组学数据的公共存储库包含大量的实验 ChIP-seq 数据,但它们在多个条件下的重复使用和综合分析仍然是一项艰巨的任务。

结果

我们提出了 Combinatorial and Semantic Analysis of Functional Elements (CombSAFE),这是一种高效的计算方法,能够整合和利用大数据存储库中公开的大量但异构的 ChIP-seq 数据。它利用自然语言处理技术,将组学数据样本与来自选定生物医学本体的语义注释集成;然后,使用隐马尔可夫模型,为相应的样本识别整个基因组中静态和动态功能元素的组合。CombSAFE 允许通过聚类具有相似功能元素的区域模式,并通过富集分析来发现与它们显著相关的本体术语,从而分析整个基因组。此外,它还允许比较特定基因组区域的功能状态,以分析它们在各种语义注释中的不同行为。通过在不同的生物条件下识别功能元素的意外组合,可以提供新的见解。

可用性和实现

CombSAFE 管道的 Python 实现可在非商业用途下免费使用:https://github.com/DEIB-GECO/CombSAFE。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
Identification, semantic annotation and comparison of combinations of functional elements in multiple biological conditions.在多种生物条件下识别、语义注释和比较功能元素组合。
Bioinformatics. 2022 Feb 7;38(5):1183-1190. doi: 10.1093/bioinformatics/btab815.
2
RACS: rapid analysis of ChIP-Seq data for contig based genomes.RACS:基于连续基因组的 ChIP-Seq 数据的快速分析。
BMC Bioinformatics. 2019 Oct 29;20(1):533. doi: 10.1186/s12859-019-3100-2.
3
Ontology-based annotations and semantic relations in large-scale (epi)genomics data.大规模(表观)基因组学数据中基于本体的注释和语义关系。
Brief Bioinform. 2017 May 1;18(3):403-412. doi: 10.1093/bib/bbw036.
4
ISEScan: automated identification of insertion sequence elements in prokaryotic genomes.ISEScan:原核生物基因组中插入序列元件的自动识别。
Bioinformatics. 2017 Nov 1;33(21):3340-3347. doi: 10.1093/bioinformatics/btx433.
5
piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing.piPipes:一组通过小RNA测序、RNA测序、降解组和CAGE测序、染色质免疫沉淀测序以及基因组DNA测序进行piRNA和转座子分析的管道。
Bioinformatics. 2015 Feb 15;31(4):593-5. doi: 10.1093/bioinformatics/btu647. Epub 2014 Oct 17.
6
Application of topic models to a compendium of ChIP-Seq datasets uncovers recurrent transcriptional regulatory modules.主题模型在 ChIP-Seq 数据集丛集中的应用揭示了反复出现的转录调控模块。
Bioinformatics. 2020 Apr 15;36(8):2352-2358. doi: 10.1093/bioinformatics/btz975.
7
Chromatin accessibility prediction via a hybrid deep convolutional neural network.基于混合深度卷积神经网络的染色质可及性预测。
Bioinformatics. 2018 Mar 1;34(5):732-738. doi: 10.1093/bioinformatics/btx679.
8
EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications.表观基因组学:使用组蛋白修饰的多元概率模型进行全基因组转录单元注释。
Epigenetics Chromatin. 2020 Apr 7;13(1):20. doi: 10.1186/s13072-020-00341-z.
9
Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.读取-分割-运行:一种利用RNA测序数据识别全基因组非经典剪接区域的改进型生物信息学流程。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.
10
Annotating regulatory elements by heterogeneous network embedding.通过异质网络嵌入进行调控元件注释。
Bioinformatics. 2022 May 13;38(10):2899-2911. doi: 10.1093/bioinformatics/btac185.

引用本文的文献

1
Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics.连接人工智能与生物科学:生物信息学中大型语言模型的全面综述
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf357.
2
GeMI: interactive interface for transformer-based Genomic Metadata Integration.GeMI:基于转换器的基因组元数据集成的交互式接口。
Database (Oxford). 2022 Jun 3;2022. doi: 10.1093/database/baac036.