• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于模型的循环二元分割算法用于阵列比较基因组杂交数据分析。

A model-based circular binary segmentation algorithm for the analysis of array CGH data.

作者信息

Hsu Fang-Han, Chen Hung-I H, Tsai Mong-Hsun, Lai Liang-Chuan, Huang Chi-Cheng, Tu Shih-Hsin, Chuang Eric Y, Chen Yidong

机构信息

Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan.

出版信息

BMC Res Notes. 2011 Oct 10;4:394. doi: 10.1186/1756-0500-4-394.

DOI:10.1186/1756-0500-4-394
PMID:21985277
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3224564/
Abstract

BACKGROUND

Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself.

RESULTS

We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process.

CONCLUSIONS

A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.

摘要

背景

循环二元分割(CBS)是一种用于阵列比较基因组杂交(aCGH)数据分析的基于排列的算法。CBS通过使用最大t检验检测变化点来准确分割数据;但使用排列评估变化点的显著性涉及大量计算负担。最近提出了一种利用混合方法和早期停止规则的实现方式(混合CBS)来提高速度性能。然而,时间分析表明,混合CBS的大部分计算时间仍花费在排列上。此外,混合方法提供的是显著性上限或下限的近似值,而不是变化点本身显著性的近似值。

结果

我们开发了一种基于模型的新算法,基于极值的CBS(eCBS),它限制排列并提供稳健结果且不损失准确性。基于各种非正态假设预先模拟了数千个零假设下的aCGH数据,并通过广义极值(GEV)分布对相应的最大t分布进行建模。将aCGH数据特征与GEV参数相关联的建模结果构成查找表(极值模型)。使用该极值模型,通过查表过程可以在恒定时间复杂度内评估变化点的显著性。

结论

本研究开发了一种新算法eCBS。当前eCBS的实现方式在计算时间上始终比混合CBS快4倍到20倍,且不损失准确性。源代码、补充材料、补充图和补充表可在http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary上找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/bb464ca4005f/1756-0500-4-394-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/eaf859f41343/1756-0500-4-394-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/fccb2d158e5d/1756-0500-4-394-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/ae20fdf6ab9a/1756-0500-4-394-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/3e4d8efa25a6/1756-0500-4-394-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/bb464ca4005f/1756-0500-4-394-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/eaf859f41343/1756-0500-4-394-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/fccb2d158e5d/1756-0500-4-394-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/ae20fdf6ab9a/1756-0500-4-394-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/3e4d8efa25a6/1756-0500-4-394-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97ad/3224564/bb464ca4005f/1756-0500-4-394-5.jpg

相似文献

1
A model-based circular binary segmentation algorithm for the analysis of array CGH data.一种基于模型的循环二元分割算法用于阵列比较基因组杂交数据分析。
BMC Res Notes. 2011 Oct 10;4:394. doi: 10.1186/1756-0500-4-394.
2
A faster circular binary segmentation algorithm for the analysis of array CGH data.一种用于分析阵列比较基因组杂交数据的更快的循环二元分割算法。
Bioinformatics. 2007 Mar 15;23(6):657-63. doi: 10.1093/bioinformatics/btl646. Epub 2007 Jan 18.
3
A probe-density-based analysis method for array CGH data: simulation, normalization and centralization.一种基于探针密度的阵列比较基因组杂交(array CGH)数据分析方法:模拟、标准化和中心化
Bioinformatics. 2008 Aug 15;24(16):1749-56. doi: 10.1093/bioinformatics/btn321. Epub 2008 Jul 4.
4
Robust smooth segmentation approach for array CGH data analysis.用于阵列比较基因组杂交数据分析的稳健平滑分割方法。
Bioinformatics. 2007 Sep 15;23(18):2463-9. doi: 10.1093/bioinformatics/btm359. Epub 2007 Jul 27.
5
Modified screening and ranking algorithm for copy number variation detection.用于拷贝数变异检测的改进筛选与排序算法
Bioinformatics. 2015 May 1;31(9):1341-8. doi: 10.1093/bioinformatics/btu850. Epub 2014 Dec 25.
6
An integrated analysis tool for analyzing hybridization intensities and genotypes using new-generation population-optimized human arrays.一种使用新一代群体优化人类阵列分析杂交强度和基因型的综合分析工具。
BMC Genomics. 2016 Mar 31;17:266. doi: 10.1186/s12864-016-2478-8.
7
A fast and flexible method for the segmentation of aCGH data.一种用于阵列比较基因组杂交(aCGH)数据分割的快速灵活方法。
Bioinformatics. 2008 Aug 15;24(16):i139-45. doi: 10.1093/bioinformatics/btn272.
8
A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data.基于隐马尔可夫模型的算法,用于使用阵列 CGH 数据识别肿瘤亚型。
BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S10. doi: 10.1186/1471-2164-12-S5-S10.
9
A very fast and accurate method for calling aberrations in array-CGH data.一种用于快速准确调用阵列 CGH 数据中的畸变的方法。
Biostatistics. 2010 Jul;11(3):515-8. doi: 10.1093/biostatistics/kxq008. Epub 2010 Mar 5.
10
Heavy-Tailed Noise Suppression and Derivative Wavelet Scalogram for Detecting DNA Copy Number Aberrations.重尾噪声抑制和导数小波标度图在检测 DNA 拷贝数异常中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1625-1635. doi: 10.1109/TCBB.2017.2723884. Epub 2017 Jul 6.

引用本文的文献

1
Enabling sensitive and precise detection of ctDNA through somatic copy number aberrations in breast cancer.通过乳腺癌中的体细胞拷贝数畸变实现对ctDNA的灵敏和精确检测。
NPJ Breast Cancer. 2025 Mar 8;11(1):25. doi: 10.1038/s41523-025-00739-6.
2
Mapping the micro-proteome of the nuclear lamina and lamina-associated domains.绘制核纤层和核纤层相关结构域的微观蛋白质组图谱。
Life Sci Alliance. 2021 Mar 23;4(5). doi: 10.26508/lsa.202000774. Print 2021 May.
3
A genomic approach to study down syndrome and cancer inverse comorbidity: untangling the chromosome 21.

本文引用的文献

1
Integrated genomic analyses of ovarian carcinoma.卵巢癌的综合基因组分析。
Nature. 2011 Jun 29;474(7353):609-15. doi: 10.1038/nature10166.
2
A Bayesian segmentation approach to ascertain copy number variations at the population level.一种用于在群体水平上确定拷贝数变异的贝叶斯分割方法。
Bioinformatics. 2009 Jul 1;25(13):1669-79. doi: 10.1093/bioinformatics/btp270. Epub 2009 Apr 23.
3
A note on oligonucleotide expression values not being normally distributed.关于寡核苷酸表达值不呈正态分布的说明。
一种研究唐氏综合征与癌症反向共病关系的基因组学方法:解析21号染色体。
Front Physiol. 2015 Feb 4;6:10. doi: 10.3389/fphys.2015.00010. eCollection 2015.
4
Reducing confounding and suppression effects in TCGA data: an integrated analysis of chemotherapy response in ovarian cancer.降低 TCGA 数据中的混杂和抑制效应:卵巢癌化疗反应的综合分析。
BMC Genomics. 2012;13 Suppl 6(Suppl 6):S13. doi: 10.1186/1471-2164-13-S6-S13. Epub 2012 Oct 26.
5
Current analysis platforms and methods for detecting copy number variation.当前用于检测拷贝数变异的分析平台和方法。
Physiol Genomics. 2013 Jan 7;45(1):1-16. doi: 10.1152/physiolgenomics.00082.2012. Epub 2012 Nov 6.
Biostatistics. 2009 Jul;10(3):446-50. doi: 10.1093/biostatistics/kxp003. Epub 2009 Mar 10.
4
Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.使用全基因组大规模平行双末端测序鉴定癌症中的体细胞获得性重排。
Nat Genet. 2008 Jun;40(6):722-9. doi: 10.1038/ng.128. Epub 2008 Apr 27.
5
High-resolution, dual-platform aCGH analysis reveals frequent HIPK2 amplification and increased expression in pilocytic astrocytomas.高分辨率双平台阵列比较基因组杂交分析显示,毛细胞型星形细胞瘤中频繁出现HIPK2基因扩增且表达增加。
Oncogene. 2008 Aug 7;27(34):4745-51. doi: 10.1038/onc.2008.110. Epub 2008 Apr 14.
6
Sparse representation and Bayesian detection of genome copy number alterations from microarray data.基于微阵列数据的基因组拷贝数变异的稀疏表示与贝叶斯检测
Bioinformatics. 2008 Feb 1;24(3):309-18. doi: 10.1093/bioinformatics/btm601. Epub 2008 Jan 18.
7
Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability.拷贝数变异与遗传性状:更接近表型到基因型变异性的解析
Nat Rev Genet. 2007 Aug;8(8):639-46. doi: 10.1038/nrg2149.
8
A faster circular binary segmentation algorithm for the analysis of array CGH data.一种用于分析阵列比较基因组杂交数据的更快的循环二元分割算法。
Bioinformatics. 2007 Mar 15;23(6):657-63. doi: 10.1093/bioinformatics/btl646. Epub 2007 Jan 18.
9
BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data.BioHMM:一种用于分割阵列比较基因组杂交数据的异构隐马尔可夫模型。
Bioinformatics. 2006 May 1;22(9):1144-6. doi: 10.1093/bioinformatics/btl089. Epub 2006 Mar 13.
10
A comparison study: applying segmentation to array CGH data for downstream analyses.一项比较研究:将分割应用于阵列比较基因组杂交数据以进行下游分析。
Bioinformatics. 2005 Nov 15;21(22):4084-91. doi: 10.1093/bioinformatics/bti677. Epub 2005 Sep 13.