• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Bamgineer:外显子组和靶向序列数据集模拟等位基因特异性拷贝数变异的引入。

Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets.

机构信息

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.

Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.

出版信息

PLoS Comput Biol. 2018 Mar 28;14(3):e1006080. doi: 10.1371/journal.pcbi.1006080. eCollection 2018 Mar.

DOI:10.1371/journal.pcbi.1006080
PMID:29590101
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5891060/
Abstract

Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20-100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer.

摘要

体细胞拷贝数变异(CNVs)在许多人类癌症的发展中起着至关重要的作用。新一代测序数据的广泛可用性使得开发算法能够从各种数据类型(包括外显子和靶向序列数据)计算推断 CNV 谱成为可能;目前这是最常见的癌症基因组学数据类型。然而,由于缺乏真实的参考数据集,对这些工具进行系统评估和比较仍然具有挑战性。为了解决这一需求,我们开发了 Bamgineer,这是一个用 Python 编写的工具,用于将用户定义的单体型相位等位基因特异性拷贝数事件引入现有的二进制对准映射(BAM)文件中,重点是靶向和外显子测序实验。作为输入,该工具需要一个读取对齐文件(BAM 格式)、用于引入增益和损耗的非重叠基因组坐标列表(bed 文件)以及定义已知单体型的可选文件(vcf 格式)。为了提高运行时性能,Bamgineer 使用队列和本地机器或高性能计算集群上的并行处理并行引入所需的 CNVs。作为原理验证,我们将 Bamgineer 应用于来自血液样本的单个高覆盖率(平均值:220X)外显子序列文件,以模拟来自 10 种肿瘤类型的每个肿瘤的 3 个肿瘤细胞活力水平(20-100%,总共 150 个 BAM 文件)的拷贝数谱。为了证明超出外显子数据的可行性,我们将读取对齐引入靶向 5 个基因的无细胞 DNA 测序文库中,以模拟与循环肿瘤 DNA 一致的 EGFR 扩增频率(10、1、0.1 和 0.01%),同时保留原始数据的多模态插入大小分布。我们预计 Bamgineer 将有助于用户使用本地生成的数据开发和系统地对 CNV 调用算法进行基准测试,以用于各种应用。源代码可在 http://github.com/pughlab/bamgineer 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/d2e332f0ca27/pcbi.1006080.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/5756547a91da/pcbi.1006080.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/51af46c36bcf/pcbi.1006080.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/177243a7b5f5/pcbi.1006080.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/9e794c6a8b2a/pcbi.1006080.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/d2e332f0ca27/pcbi.1006080.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/5756547a91da/pcbi.1006080.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/51af46c36bcf/pcbi.1006080.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/177243a7b5f5/pcbi.1006080.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/9e794c6a8b2a/pcbi.1006080.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e72/5891060/d2e332f0ca27/pcbi.1006080.g005.jpg

相似文献

1
Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets.Bamgineer:外显子组和靶向序列数据集模拟等位基因特异性拷贝数变异的引入。
PLoS Comput Biol. 2018 Mar 28;14(3):e1006080. doi: 10.1371/journal.pcbi.1006080. eCollection 2018 Mar.
2
Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.Sequenza:来自肿瘤测序数据的等位基因特异性拷贝数和突变图谱。
Ann Oncol. 2015 Jan;26(1):64-70. doi: 10.1093/annonc/mdu479. Epub 2014 Oct 15.
3
SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution.SVEngine:一种高效、通用的基因组结构变异模拟器,具有癌症克隆进化特征。
Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy081.
4
An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.使用全外显子组测序数据对癌症拷贝数变异检测工具的评估
BMC Bioinformatics. 2017 May 31;18(1):286. doi: 10.1186/s12859-017-1705-x.
5
Evaluation of somatic copy number estimation tools for whole-exome sequencing data.全外显子组测序数据的体细胞拷贝数估计工具评估
Brief Bioinform. 2016 Mar;17(2):185-92. doi: 10.1093/bib/bbv055. Epub 2015 Jul 25.
6
PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data.PatternCNV:一种用于从外显子组测序数据中检测拷贝数变化的通用工具。
Bioinformatics. 2014 Sep 15;30(18):2678-80. doi: 10.1093/bioinformatics/btu363. Epub 2014 May 29.
7
FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing.FACETS:用于高通量DNA测序的等位基因特异性拷贝数和克隆异质性分析工具。
Nucleic Acids Res. 2016 Sep 19;44(16):e131. doi: 10.1093/nar/gkw520. Epub 2016 Jun 7.
8
Mosdepth: quick coverage calculation for genomes and exomes.Mosdepth:基因组和外显子组的快速覆盖度计算。
Bioinformatics. 2018 Mar 1;34(5):867-868. doi: 10.1093/bioinformatics/btx699.
9
SavvyCNV: Genome-wide CNV calling from off-target reads.SavvyCNV:从脱靶reads 进行全基因组 CNV 调用。
PLoS Comput Biol. 2022 Mar 16;18(3):e1009940. doi: 10.1371/journal.pcbi.1009940. eCollection 2022 Mar.
10
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.从全基因组和全外显子组测序中发现等位基因特异性拷贝数
Nucleic Acids Res. 2015 Aug 18;43(14):e90. doi: 10.1093/nar/gkv319. Epub 2015 Apr 16.

引用本文的文献

1
Variant calling and benchmarking in an era of complete human genome sequences.全基因组序列时代的变异调用和基准测试。
Nat Rev Genet. 2023 Jul;24(7):464-483. doi: 10.1038/s41576-023-00590-0. Epub 2023 Apr 14.
2
CNVind: an open source cloud-based pipeline for rare CNVs detection in whole exome sequencing data based on the depth of coverage.CNVind:一个基于覆盖深度的全外显子测序数据中罕见 CNVs 检测的开源云端分析流程。
BMC Bioinformatics. 2022 Mar 5;23(1):85. doi: 10.1186/s12859-022-04617-x.

本文引用的文献

1
Circulating tumour DNA sequence analysis as an alternative to multiple myeloma bone marrow aspirates.循环肿瘤 DNA 序列分析可替代多发性骨髓瘤骨髓穿刺。
Nat Commun. 2017 May 11;8:15086. doi: 10.1038/ncomms15086.
2
Liquid biopsies come of age: towards implementation of circulating tumour DNA.液体活检时代的到来:迈向循环肿瘤 DNA 的临床应用。
Nat Rev Cancer. 2017 Apr;17(4):223-238. doi: 10.1038/nrc.2017.7. Epub 2017 Feb 24.
3
A comparison of tools for the simulation of genomic next-generation sequencing data.用于模拟基因组下一代测序数据的工具比较。
Nat Rev Genet. 2016 Aug;17(8):459-69. doi: 10.1038/nrg.2016.57. Epub 2016 Jun 20.
4
Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection.将肿瘤基因组模拟与众包相结合,以评估体细胞单核苷酸变异检测。
Nat Methods. 2015 Jul;12(7):623-30. doi: 10.1038/nmeth.3407. Epub 2015 May 18.
5
Sambamba: fast processing of NGS alignment formats.Sambamba:快速处理 NGS 比对格式。
Bioinformatics. 2015 Jun 15;31(12):2032-4. doi: 10.1093/bioinformatics/btv098. Epub 2015 Feb 19.
6
Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.Sequenza:来自肿瘤测序数据的等位基因特异性拷贝数和突变图谱。
Ann Oncol. 2015 Jan;26(1):64-70. doi: 10.1093/annonc/mdu479. Epub 2014 Oct 15.
7
Copy number variation detection using next generation sequencing read counts.使用下一代测序读段计数进行拷贝数变异检测。
BMC Bioinformatics. 2014 Apr 14;15:109. doi: 10.1186/1471-2105-15-109.
8
An evaluation of copy number variation detection tools from whole-exome sequencing data.基于全外显子组测序数据的拷贝数变异检测工具评估
Hum Mutat. 2014 Jul;35(7):899-907. doi: 10.1002/humu.22537. Epub 2014 May 1.
9
Wessim: a whole-exome sequencing simulator based on in silico exome capture.Wessim:基于计算机模拟外显子捕获的全外显子组测序模拟工具。
Bioinformatics. 2013 Apr 15;29(8):1076-7. doi: 10.1093/bioinformatics/btt074. Epub 2013 Feb 14.
10
VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.VarScan 2:通过外显子组测序发现癌症中的体细胞突变和拷贝数改变。
Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2.