• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在 TF 结合信号的背景下对 ENCODE 和 Cistrome 进行比较分析。

A comparative analysis of ENCODE and Cistrome in the context of TF binding signal.

机构信息

Lee Kong Chian School of Medicine, Nanyang Technological University, 9 Nanyang Drive, 636921, Singapore, Singapore.

Department of Electronics, Information and Bioengineering, Politecnico di Milano, 32 Piazza Leonardo da Vinci, 20133, Milano, Italy.

出版信息

BMC Genomics. 2024 Aug 30;25(Suppl 3):817. doi: 10.1186/s12864-024-10668-6.

DOI:10.1186/s12864-024-10668-6
PMID:39210256
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11363379/
Abstract

BACKGROUND

With the rise of publicly available genomic data repositories, it is now common for scientists to rely on computational models and preprocessed data, either as control or to discover new knowledge. However, different repositories adhere to the different principles and guidelines, and data processing plays a significant role in the quality of the resulting datasets. Two popular repositories for transcription factor binding sites data - ENCODE and Cistrome - process the same biological samples in alternative ways, and their results are not always consistent. Moreover, the output format of the processing (BED narrowPeak) exposes a feature, the signalValue, which is seldom used in consistency checks, but can offer valuable insight on the quality of the data.

RESULTS

We provide evidence that data points with high signalValue(s) (top 25% of values) are more likely to be consistent between ENCODE and Cistrome in human cell lines K562, GM12878, and HepG2. In addition, we show that filtering according to said high values improves the quality of predictions for a machine learning algorithm that detects transcription factor interactions based only on positional information. Finally, we provide a set of practices and guidelines, based on the signalValue feature, for scientists who wish to compare and merge narrowPeaks from ENCODE and Cistrome.

CONCLUSIONS

The signalValue feature is an informative feature that can be effectively used to highlight consistent areas of overlap between different sources of TF binding sites that expose it. Its applicability extends to downstream to positional machine learning algorithms, making it a powerful tool for performance tweaking and data aggregation.

摘要

背景

随着公开可用的基因组数据库的兴起,科学家现在通常依赖于计算模型和预处理数据,无论是作为控制还是发现新知识。然而,不同的数据库遵循不同的原则和指南,数据处理在产生数据集的质量中起着重要作用。两个流行的转录因子结合位点数据存储库——ENCODE 和 Cistrome——以不同的方式处理相同的生物样本,它们的结果并不总是一致的。此外,处理的输出格式(BED narrowPeak)暴露了一个特征,即 signalValue,它在一致性检查中很少使用,但可以提供有关数据质量的有价值的见解。

结果

我们提供的证据表明,在人类细胞系 K562、GM12878 和 HepG2 中,signalValue 值较高(前 25%的值)的数据点在 ENCODE 和 Cistrome 之间更有可能保持一致。此外,我们表明,根据所述高值进行过滤可以提高仅基于位置信息检测转录因子相互作用的机器学习算法的预测质量。最后,我们提供了一套基于 signalValue 特征的实践和指导方针,供希望比较和合并 ENCODE 和 Cistrome 窄峰的科学家使用。

结论

signalValue 特征是一个有用的特征,可有效用于突出显示暴露该特征的不同 TF 结合位点来源之间一致的重叠区域。它的适用性扩展到基于位置的机器学习算法的下游,使其成为性能调整和数据聚合的强大工具。

相似文献

1
A comparative analysis of ENCODE and Cistrome in the context of TF binding signal.在 TF 结合信号的背景下对 ENCODE 和 Cistrome 进行比较分析。
BMC Genomics. 2024 Aug 30;25(Suppl 3):817. doi: 10.1186/s12864-024-10668-6.
2
BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.BinDNase:一种利用DNA酶I超敏反应数据进行转录因子结合预测的鉴别方法。
Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.
3
Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding.通过局部敏感统一嵌入实现转录因子结合谱的协同完成
IEEE Trans Nanobioscience. 2016 Dec;15(8):946-958. doi: 10.1109/TNB.2016.2625823. Epub 2016 Nov 7.
4
Properly defining the targets of a transcription factor significantly improves the computational identification of cooperative transcription factor pairs in yeast.正确定义转录因子的靶标可显著提高酵母中协同转录因子对的计算识别能力。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S10. doi: 10.1186/1471-2164-16-S12-S10. Epub 2015 Dec 9.
5
Quantitative modeling of transcription factor binding specificities using DNA shape.利用DNA形状对转录因子结合特异性进行定量建模。
Proc Natl Acad Sci U S A. 2015 Apr 14;112(15):4654-9. doi: 10.1073/pnas.1422023112. Epub 2015 Mar 9.
6
TICA: Transcriptional Interaction and Coregulation Analyzer.TICA:转录相互作用和协同调控分析器。
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):342-353. doi: 10.1016/j.gpb.2018.05.004. Epub 2018 Dec 19.
7
QBiC-Pred: quantitative predictions of transcription factor binding changes due to sequence variants.QBiC-Pred:用于预测序列变异导致转录因子结合变化的定量预测。
Nucleic Acids Res. 2019 Jul 2;47(W1):W127-W135. doi: 10.1093/nar/gkz363.
8
COPS: detecting co-occurrence and spatial arrangement of transcription factor binding motifs in genome-wide datasets.COPS:在全基因组数据集中检测转录因子结合基序的共现和空间排列。
PLoS One. 2012;7(12):e52055. doi: 10.1371/journal.pone.0052055. Epub 2012 Dec 18.
9
Sequence and chromatin determinants of cell-type-specific transcription factor binding.细胞类型特异性转录因子结合的序列和染色质决定因素。
Genome Res. 2012 Sep;22(9):1723-34. doi: 10.1101/gr.127712.111.
10
AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification.AIControl:用机器学习替代匹配对照实验可提高 ChIP-seq 峰识别。
Nucleic Acids Res. 2019 Jun 4;47(10):e58. doi: 10.1093/nar/gkz156.

本文引用的文献

1
Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis.当心杰卡德相似系数:在基因组共定位分析中,相似性度量的选择很重要且并非微不足道。
Brief Bioinform. 2020 Sep 25;21(5):1523-1530. doi: 10.1093/bib/bbz083.
2
The ENCODE Blacklist: Identification of Problematic Regions of the Genome.ENCODE 黑名单:基因组中问题区域的鉴定。
Sci Rep. 2019 Jun 27;9(1):9354. doi: 10.1038/s41598-019-45839-z.
3
TICA: Transcriptional Interaction and Coregulation Analyzer.TICA:转录相互作用和协同调控分析器。
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):342-353. doi: 10.1016/j.gpb.2018.05.004. Epub 2018 Dec 19.
4
The BioGRID interaction database: 2019 update.生物相互作用数据库(BioGRID):2019 年更新版。
Nucleic Acids Res. 2019 Jan 8;47(D1):D529-D541. doi: 10.1093/nar/gky1079.
5
Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis.染色质数据浏览器:扩展数据集和新的基因调控分析工具。
Nucleic Acids Res. 2019 Jan 8;47(D1):D729-D735. doi: 10.1093/nar/gky1094.
6
CORUM: the comprehensive resource of mammalian protein complexes-2019.CORUM:哺乳动物蛋白质复合物综合资源-2019 年版。
Nucleic Acids Res. 2019 Jan 8;47(D1):D559-D563. doi: 10.1093/nar/gky973.
7
Sequential Integration of Fuzzy Clustering and Expectation Maximization for Transcription Factor Binding Site Identification.用于转录因子结合位点识别的模糊聚类与期望最大化的顺序集成
J Comput Biol. 2018 Nov;25(11):1247-1256. doi: 10.1089/cmb.2017.0230. Epub 2018 Aug 22.
8
The Human Transcription Factors.人类转录因子。
Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.
9
The Encyclopedia of DNA elements (ENCODE): data portal update.《DNA 元件百科全书》(ENCODE):数据门户更新。
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801. doi: 10.1093/nar/gkx1081.
10
TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions.TRRUST v2:一个扩展的人类和小鼠转录调控相互作用的参考数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D380-D386. doi: 10.1093/nar/gkx1013.