• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于下一代测序数据的自适应带宽核密度估计

Adaptive bandwidth kernel density estimation for next-generation sequencing data.

作者信息

Ramachandran Parameswaran, Perkins Theodore J

出版信息

BMC Proc. 2013 Dec 20;7(Suppl 7):S7. doi: 10.1186/1753-6561-7-S7-S7.

DOI:10.1186/1753-6561-7-S7-S7
PMID:24564977
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4043421/
Abstract

BACKGROUND

High-throughput sequencing experiments can be viewed as measuring some sort of a "genomic signal" that may represent a biological event such as the binding of a transcription factor to the genome, locations of chromatin modifications, or even a background or control condition. Numerous algorithms have been developed to extract different kinds of information from such data. However, there has been very little focus on the reconstruction of the genomic signal itself. Such reconstructions may be useful for a variety of purposes ranging from simple visualization of the signals to sophisticated comparison of different datasets.

METHODS

Here, we propose that adaptive-bandwidth kernel density estimators are well-suited for genomic signal reconstructions. This class of estimators is a natural extension of the fixed-bandwidth estimators that have been employed in several existing ChIP-Seq analysis programs.

RESULTS

Using a set of ChIP-Seq datasets from the ENCODE project, we show that adaptive-bandwidth estimators have greater accuracy at signal reconstruction compared to fixed-bandwidth estimators, and that they have significant advantages in terms of visualization as well. For both fixed and adaptive-bandwidth schemes, we demonstrate that smoothing parameters can be set automatically using a held-out set of tuning data. We also carry out a computational complexity analysis of the different schemes and confirm through experimentation that the necessary computations can be readily carried out on a modern workstation without any significant issues.

摘要

背景

高通量测序实验可被视为对某种“基因组信号”的测量,这种信号可能代表一种生物学事件,如转录因子与基因组的结合、染色质修饰的位置,甚至是一种背景或对照条件。已经开发了许多算法来从此类数据中提取不同类型的信息。然而,很少有人关注基因组信号本身的重建。这种重建对于从简单的信号可视化到不同数据集的复杂比较等各种目的可能都很有用。

方法

在此,我们提出自适应带宽核密度估计器非常适合基因组信号重建。这类估计器是在几个现有的ChIP-Seq分析程序中使用的固定带宽估计器的自然扩展。

结果

使用来自ENCODE项目的一组ChIP-Seq数据集,我们表明与固定带宽估计器相比,自适应带宽估计器在信号重建方面具有更高的准确性,并且在可视化方面也具有显著优势。对于固定和自适应带宽方案,我们证明可以使用一组留出的调优数据自动设置平滑参数。我们还对不同方案进行了计算复杂度分析,并通过实验证实必要的计算可以在现代工作站上轻松进行,没有任何重大问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/c07602267064/1753-6561-7-S7-S7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/87776f3a2527/1753-6561-7-S7-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/018be6bd0b39/1753-6561-7-S7-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/c07602267064/1753-6561-7-S7-S7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/87776f3a2527/1753-6561-7-S7-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/018be6bd0b39/1753-6561-7-S7-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5acd/4043421/c07602267064/1753-6561-7-S7-S7-3.jpg

相似文献

1
Adaptive bandwidth kernel density estimation for next-generation sequencing data.用于下一代测序数据的自适应带宽核密度估计
BMC Proc. 2013 Dec 20;7(Suppl 7):S7. doi: 10.1186/1753-6561-7-S7-S7.
2
Optimal bandwidth estimators of kernel density functionals for contaminated data.用于污染数据的核密度泛函的最优带宽估计器。
J Appl Stat. 2021 Jul 11;48(13-15):2239-2258. doi: 10.1080/02664763.2021.1944999. eCollection 2021.
3
How bandwidth selection algorithms impact exploratory data analysis using kernel density estimation.带宽选择算法如何影响基于核密度估计的探索性数据分析。
Psychol Methods. 2014 Sep;19(3):428-443. doi: 10.1037/a0036850. Epub 2014 Jun 2.
4
Online Anomaly Detection With Bandwidth Optimized Hierarchical Kernel Density Estimators.基于带宽优化分层核密度估计器的在线异常检测
IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):4253-4266. doi: 10.1109/TNNLS.2020.3017675. Epub 2021 Aug 31.
5
Double-smoothing in kernel hazard rate estimation.核危险率估计中的双重平滑
Methods Inf Med. 2008;47(2):167-73.
6
Online Kernel Learning With Adaptive Bandwidth by Optimal Control Approach.基于最优控制方法的自适应带宽在线核学习
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):1920-1934. doi: 10.1109/TNNLS.2020.2995482. Epub 2021 May 3.
7
Hazard function estimators: a simulation study.风险函数估计器:一项模拟研究。
Stat Med. 1999 Nov 30;18(22):3075-88. doi: 10.1002/(sici)1097-0258(19991130)18:22<3075::aid-sim244>3.0.co;2-6.
8
Adaptive kernel estimation of spatial relative risk.空间相对风险的自适应核估计。
Stat Med. 2010 Oct 15;29(23):2423-37. doi: 10.1002/sim.3995.
9
Kernel bandwidth optimization in spike rate estimation.尖峰速率估计中的核带宽优化
J Comput Neurosci. 2010 Aug;29(1-2):171-182. doi: 10.1007/s10827-009-0180-4. Epub 2009 Aug 5.
10
GPAT: retrieval of genomic annotation from large genomic position datasets.GPAT:从大型基因组位置数据集中检索基因组注释。
BMC Bioinformatics. 2008 Dec 15;9:533. doi: 10.1186/1471-2105-9-533.

引用本文的文献

1
F-Seq2: improving the feature density based peak caller with dynamic statistics.F-Seq2:利用动态统计改进基于特征密度的峰检测工具
NAR Genom Bioinform. 2021 Feb 23;3(1):lqab012. doi: 10.1093/nargab/lqab012. eCollection 2021 Mar.
2
RECAP reveals the true statistical significance of ChIP-seq peak calls.RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。
Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.

本文引用的文献

1
MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data.MaSC:用于估计单端短读测序数据平均片段长度的可映射性敏感互相关。
Bioinformatics. 2013 Feb 15;29(4):444-50. doi: 10.1093/bioinformatics/btt001. Epub 2013 Jan 7.
2
Differential analysis of gene regulation at transcript resolution with RNA-seq.基于 RNA-seq 的转录分辨率下基因调控的差异分析。
Nat Biotechnol. 2013 Jan;31(1):46-53. doi: 10.1038/nbt.2450. Epub 2012 Dec 9.
3
Summarizing and correcting the GC content bias in high-throughput sequencing.
高通量测序中 GC 含量偏倚的总结与校正。
Nucleic Acids Res. 2012 May;40(10):e72. doi: 10.1093/nar/gks001. Epub 2012 Feb 9.
4
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.挑选 ChIP-seq 峰检测器来分析染色质修饰实验。
Nucleic Acids Res. 2012 May;40(9):e70. doi: 10.1093/nar/gks048. Epub 2012 Feb 3.
5
ChIP-Seq data analysis: identification of protein-DNA binding sites with SISSRs peak-finder.染色质免疫沉淀测序(ChIP-Seq)数据分析:使用SISSRs峰检测工具鉴定蛋白质-DNA结合位点
Methods Mol Biol. 2012;802:305-22. doi: 10.1007/978-1-61779-400-1_20.
6
A decade's perspective on DNA sequencing technology.DNA 测序技术的十年展望。
Nature. 2011 Feb 10;470(7333):198-203. doi: 10.1038/nature09796.
7
Differential expression analysis for sequence count data.差异表达分析序列计数数据。
Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.
8
Evaluation of algorithm performance in ChIP-seq peak detection.评估 ChIP-seq 峰检测算法的性能。
PLoS One. 2010 Jul 8;5(7):e11471. doi: 10.1371/journal.pone.0011471.
9
Estimating the stochastic bifurcation structure of cellular networks.估计细胞网络的随机分岔结构。
PLoS Comput Biol. 2010 Mar 5;6(3):e1000699. doi: 10.1371/journal.pcbi.1000699.
10
A scaling normalization method for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析的缩放标准化方法。
Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.