• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iSeg:一种用于基因组和表观基因组数据分割的高效算法。

iSeg: an efficient algorithm for segmentation of genomic and epigenomic data.

机构信息

Department of Mathematics, Florida Gulf Coast University, Fort Myers, FL, USA.

Department of Statistics, Florida State University, Tallahassee, FL, USA.

出版信息

BMC Bioinformatics. 2018 Apr 11;19(1):131. doi: 10.1186/s12859-018-2140-3.

DOI:10.1186/s12859-018-2140-3
PMID:29642840
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5896135/
Abstract

BACKGROUND

Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems.

RESULTS

We designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences.

CONCLUSIONS

We have developed an efficient general-purpose segmentation tool and showed that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis. iSeg is capable of analyzing datasets that have both positive and negative values. Tunable parameters allow users to readily adjust the statistical stringency to best match the biological nature of individual datasets, including widely or sparsely mapped genomic datasets or those with non-normal distributions.

摘要

背景

识别基因组的功能元素通常需要将基因组上的测量序列划分为具有不同属性的片段,例如不同的平均值。尽管在基因组学研究中已经开发了数十种算法来解决这个问题,但仍需要更精确和快速的方法来有效地解决现有的和新兴的基因组和表观基因组分割问题。

结果

我们设计了一种名为 iSeg 的高效算法,用于基因组和表观基因组谱的分割。iSeg 首先利用动态规划来识别候选片段并测试其显著性。然后,它使用一种基于两个耦合平衡二叉树的数据结构来检测重叠的显著片段,并在搜索和细化阶段同时更新它们。最后,对显著片段进行细化和合并,以生成最终的片段集。通过使用基于片段 p 值的目标函数,该算法可以作为一个通用的计算框架,与数据分布的不同假设相结合。作为一种通用的分割方法,它可以分割不同类型的基因组和表观基因组数据,如 DNA 拷贝数变异、核小体占有率、核酸酶敏感性和差异核酸酶敏感性数据。我们使用简单的 t 检验来计算不同类型的多个数据集的 p 值,并用模拟数据集和实验数据集对 iSeg 进行了评估,并与一些其他常用方法进行了比较,结果表明,与一些经常使用更复杂统计模型的常用方法相比,iSeg 的性能令人满意。用 C++实现的 iSeg 也非常高效,非常适合处理大量输入的谱和具有非常长序列的数据集。

结论

我们开发了一种高效的通用分割工具,并表明它的结果与当代基因组数据分析中使用的许多最流行的片段调用算法相当或更准确。iSeg 能够分析具有正负值的数据集。可调参数允许用户轻松调整统计严格程度,以最佳匹配各个数据集的生物学性质,包括广泛或稀疏映射的基因组数据集或具有非正态分布的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/87dd92549957/12859_2018_2140_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/f3b5f8fabc41/12859_2018_2140_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/21b375eaa08f/12859_2018_2140_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/1fe133837702/12859_2018_2140_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/480a8fa79027/12859_2018_2140_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/0f67fbb3461c/12859_2018_2140_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/7e80355130cb/12859_2018_2140_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/3eae770f3aa3/12859_2018_2140_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/842eb962c136/12859_2018_2140_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/034776c71568/12859_2018_2140_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/cf9d3a5ba941/12859_2018_2140_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/87dd92549957/12859_2018_2140_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/f3b5f8fabc41/12859_2018_2140_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/21b375eaa08f/12859_2018_2140_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/1fe133837702/12859_2018_2140_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/480a8fa79027/12859_2018_2140_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/0f67fbb3461c/12859_2018_2140_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/7e80355130cb/12859_2018_2140_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/3eae770f3aa3/12859_2018_2140_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/842eb962c136/12859_2018_2140_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/034776c71568/12859_2018_2140_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/cf9d3a5ba941/12859_2018_2140_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d089/5896135/87dd92549957/12859_2018_2140_Fig11_HTML.jpg

相似文献

1
iSeg: an efficient algorithm for segmentation of genomic and epigenomic data.iSeg:一种用于基因组和表观基因组数据分割的高效算法。
BMC Bioinformatics. 2018 Apr 11;19(1):131. doi: 10.1186/s12859-018-2140-3.
2
Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes.在低覆盖度癌症基因组中进行大规模和焦点拷贝数改变的层次式发现。
BMC Bioinformatics. 2020 Apr 16;21(1):147. doi: 10.1186/s12859-020-3480-3.
3
MethCNA: a database for integrating genomic and epigenomic data in human cancer.MethCNA:一个整合人类癌症基因组和表观基因组数据的数据库。
BMC Genomics. 2018 Feb 13;19(1):138. doi: 10.1186/s12864-018-4525-0.
4
DBS: a fast and informative segmentation algorithm for DNA copy number analysis.DBS:一种用于 DNA 拷贝数分析的快速且信息量丰富的分割算法。
BMC Bioinformatics. 2019 Jan 3;20(1):1. doi: 10.1186/s12859-018-2565-8.
5
SLMSuite: a suite of algorithms for segmenting genomic profiles.SLM套件:一套用于分割基因组图谱的算法。
BMC Bioinformatics. 2017 Jun 28;18(1):321. doi: 10.1186/s12859-017-1734-5.
6
biomvRhsmm: genomic segmentation with hidden semi-Markov model.biomvRhsmm:使用隐半马尔可夫模型进行基因组分割
Biomed Res Int. 2014;2014:910390. doi: 10.1155/2014/910390. Epub 2014 Jun 3.
7
A comparison study: applying segmentation to array CGH data for downstream analyses.一项比较研究:将分割应用于阵列比较基因组杂交数据以进行下游分析。
Bioinformatics. 2005 Nov 15;21(22):4084-91. doi: 10.1093/bioinformatics/bti677. Epub 2005 Sep 13.
8
Large-scale genomic prediction using singular value decomposition of the genotype matrix.基于基因型矩阵奇异值分解的大规模基因组预测。
Genet Sel Evol. 2018 Feb 28;50(1):6. doi: 10.1186/s12711-018-0373-2.
9
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.GUIDEseq:一个用于分析CRISPR-Cas核酸酶的GUIDE-Seq数据集的Bioconductor软件包。
BMC Genomics. 2017 May 15;18(1):379. doi: 10.1186/s12864-017-3746-y.
10
An Entropy-Regularized Framework for Detecting Copy Number Variants.一种用于检测拷贝数变异的基于熵正则化的框架。
IEEE Trans Biomed Eng. 2019 Mar;66(3):682-688. doi: 10.1109/TBME.2018.2854628. Epub 2018 Jul 9.

引用本文的文献

1
Evolutionary Dynamics of Chromatin Structure and Duplicate Gene Expression in Diploid and Allopolyploid Cotton.二倍体和异源多倍体棉花中染色质结构和重复基因表达的进化动态。
Mol Biol Evol. 2024 May 3;41(5). doi: 10.1093/molbev/msae095.
2
DeepRegFinder: deep learning-based regulatory elements finder.DeepRegFinder:基于深度学习的调控元件查找工具。
Bioinform Adv. 2024 Jan 14;4(1):vbae007. doi: 10.1093/bioadv/vbae007. eCollection 2024.
3
Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns.

本文引用的文献

1
Open chromatin reveals the functional maize genome.开放染色质揭示了玉米的功能基因组。
Proc Natl Acad Sci U S A. 2016 May 31;113(22):E3177-84. doi: 10.1073/pnas.1525244113. Epub 2016 May 16.
2
Differential nuclease sensitivity profiling of chromatin reveals biochemical footprints coupled to gene expression and functional DNA elements in maize.染色质的差异核酸酶敏感性分析揭示了与玉米基因表达和功能性DNA元件相关的生化足迹。
Plant Cell. 2014 Oct;26(10):3883-93. doi: 10.1105/tpc.114.130609. Epub 2014 Oct 31.
3
PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data.
用于识别染色质状态和其他基因组模式的分割和基因组注释算法。
PLoS Comput Biol. 2021 Oct 14;17(10):e1009423. doi: 10.1371/journal.pcbi.1009423. eCollection 2021 Oct.
4
NucHMM: a method for quantitative modeling of nucleosome organization identifying functional nucleosome states distinctly associated with splicing potentiality.NucHMM:一种定量建模核小体组织的方法,可识别与剪接潜能明显相关的功能核小体状态。
Genome Biol. 2021 Aug 26;22(1):250. doi: 10.1186/s13059-021-02465-1.
5
The native cistrome and sequence motif families of the maize ear.玉米穗的天然顺式作用元件和序列基序家族。
PLoS Genet. 2021 Aug 12;17(8):e1009689. doi: 10.1371/journal.pgen.1009689. eCollection 2021 Aug.
6
Differential chromatin accessibility landscape reveals structural and functional features of the allopolyploid wheat chromosomes.差异染色质可及性图谱揭示了异源多倍体小麦染色体的结构和功能特征。
Genome Biol. 2020 Jul 19;21(1):176. doi: 10.1186/s13059-020-02093-1.
7
The regulatory landscape of early maize inflorescence development.早期玉米花序发育的调控格局。
Genome Biol. 2020 Jul 6;21(1):165. doi: 10.1186/s13059-020-02070-8.
8
Chromatin structure profile data from DNS-seq: Differential nuclease sensitivity mapping of four reference tissues of B73 maize ( L).来自DNS-seq的染色质结构概况数据:B73玉米(L)四个参考组织的差异核酸酶敏感性图谱
Data Brief. 2018 Aug 10;20:358-363. doi: 10.1016/j.dib.2018.08.015. eCollection 2018 Oct.
PePr:一种峰值检测优先级排序流程,用于从重复的ChIP-Seq数据中识别一致或差异峰值。
Bioinformatics. 2014 Sep 15;30(18):2568-75. doi: 10.1093/bioinformatics/btu372. Epub 2014 Jun 3.
4
Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells.用于识别芯片富集区域(SICER)的空间聚类,以绘制胚胎干细胞中组蛋白甲基化模式的区域。
Methods Mol Biol. 2014;1150:97-111. doi: 10.1007/978-1-4939-0512-6_5.
5
Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data.Segmentor3回归:一个用于快速准确分割序列数据的R包。
Algorithms Mol Biol. 2014 Mar 10;9(1):6. doi: 10.1186/1748-7188-9-6.
6
The spring-loaded genome: nucleosome redistributions are widespread, transient, and DNA-directed.弹性能量的基因组:核小体的重新分布广泛存在、短暂且受 DNA 指导。
Genome Res. 2014 Feb;24(2):251-9. doi: 10.1101/gr.160150.113. Epub 2013 Dec 5.
7
Evaluation of calling algorithms for array-CGH.阵列-CGH 调用算法的评估。
Front Genet. 2013 Oct 25;4:217. doi: 10.3389/fgene.2013.00217. eCollection 2013.
8
THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS.用于检测DNA拷贝数变异的筛选和排序算法
Ann Appl Stat. 2012 Sep;6(3):1306-1326. doi: 10.1214/12-AOAS539SUPP.
9
Optimal Sparse Segment Identification with Application in Copy Number Variation Analysis.用于拷贝数变异分析的最优稀疏片段识别
J Am Stat Assoc. 2010 Apr 1;105(491):1156-1166. doi: 10.1198/jasa.2010.tm10083. Epub 2012 Jan 1.
10
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis.超高维数据分析中稀疏段的稳健检测与识别
J R Stat Soc Series B Stat Methodol. 2012 Nov;74(5):773-797. doi: 10.1111/j.1467-9868.2012.01028.x.