• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VarSim:一个用于癌症相关高通量基因组测序的高保真模拟与验证框架。

VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

作者信息

Mu John C, Mohiyuddin Marghoob, Li Jian, Bani Asadi Narges, Gerstein Mark B, Abyzov Alexej, Wong Wing H, Lam Hugo Y K

机构信息

Department of Electrical Engineering, Stanford University, Stanford, CA 94035, USA, Department of Bioinformatics, Bina Technologies, Redwood City, CA 94065, USA, Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA, Mayo Clinics, Department of Health Sciences Research, Rochester, MN 55902, USA, Department of Statistics, Stanford University, Stanford, CA 94035, USA and Department of Health Research and Policy, Stanford University, Stanford, CA 94035, USA Department of Electrical Engineering, Stanford University, Stanford, CA 94035, USA, Department of Bioinformatics, Bina Technologies, Redwood City, CA 94065, USA, Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA, Mayo Clinics, Department of Health Sciences Research, Rochester, MN 55902, USA, Department of Statistics, Stanford University, Stanford, CA 94035, USA and Department of Health Research and Policy, Stanford University, Stanford, CA 94035, USA.

Department of Electrical Engineering, Stanford University, Stanford, CA 94035, USA, Department of Bioinformatics, Bina Technologies, Redwood City, CA 94065, USA, Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA, Mayo Clinics, Department of Health Sciences Research, Rochester, MN 55902, USA, Department of Statistics, Stanford University, Stanford, CA 94035, USA and Department of Health Research and Policy, Stanford University, Stanford, CA 94035, USA.

出版信息

Bioinformatics. 2015 May 1;31(9):1469-71. doi: 10.1093/bioinformatics/btu828. Epub 2014 Dec 17.

DOI:10.1093/bioinformatics/btu828
PMID:25524895
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4410653/
Abstract

SUMMARY

VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing.

AVAILABILITY AND IMPLEMENTATION

Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim.

CONTACT

rd@bina.com

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

VarSim是一个通过模拟或真实数据来评估高通量基因组测序中比对和变异检测准确性的框架。与模拟随机突变谱不同,它基于一个现实模型合成具有种系和体细胞突变的二倍体基因组。该模型利用诸如先前报道的突变等信息,使合成基因组具有生物学相关性。VarSim模拟并验证广泛的变异,包括单核苷酸变异、小插入缺失和大结构变异。它是一个支持并行计算和多个读取模拟器的自动化、综合性计算框架。此外,我们开发了一种新颖的映射数据结构来验证读取比对,一种按大小范围对变异进行分组比较的策略,以及一个轻量级、交互式的图形报告,以可视化带有详细统计信息的验证结果。到目前为止,它是下一代测序中二级分析最全面的验证工具。

可用性与实现

Java和Python代码以及下载读取和变异的说明可在http://bioinform.github.io/varsim获取。

联系方式

rd@bina.com

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95ef/4410653/dc3ef8ce9a5f/btu828f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95ef/4410653/8e01e0b93c6b/btu828f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95ef/4410653/dc3ef8ce9a5f/btu828f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95ef/4410653/8e01e0b93c6b/btu828f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95ef/4410653/dc3ef8ce9a5f/btu828f2p.jpg

相似文献

1
VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.VarSim:一个用于癌症相关高通量基因组测序的高保真模拟与验证框架。
Bioinformatics. 2015 May 1;31(9):1469-71. doi: 10.1093/bioinformatics/btu828. Epub 2014 Dec 17.
2
MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.MetaSV:一种用于下一代测序的准确且综合的结构变异检测工具。
Bioinformatics. 2015 Aug 15;31(16):2741-4. doi: 10.1093/bioinformatics/btv204. Epub 2015 Apr 10.
3
Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.Manta:用于种系和癌症测序应用的结构变异和插入缺失的快速检测。
Bioinformatics. 2016 Apr 15;32(8):1220-2. doi: 10.1093/bioinformatics/btv710. Epub 2015 Dec 8.
4
Leveraging known genomic variants to improve detection of variants, especially close-by Indels.利用已知的基因组变异来提高变异的检测能力,特别是附近的 Indels。
Bioinformatics. 2018 Sep 1;34(17):2918-2926. doi: 10.1093/bioinformatics/bty183.
5
LongISLND: in silico sequencing of lengthy and noisy datatypes.LongISLND:对冗长且有噪声的数据类型进行计算机模拟测序。
Bioinformatics. 2016 Dec 15;32(24):3829-3832. doi: 10.1093/bioinformatics/btw602. Epub 2016 Sep 25.
6
NGSphy: phylogenomic simulation of next-generation sequencing data.NGSphy:下一代测序数据的系统发育模拟。
Bioinformatics. 2018 Jul 15;34(14):2506-2507. doi: 10.1093/bioinformatics/bty146.
7
Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。
Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.
8
NextGenMap: fast and accurate read mapping in highly polymorphic genomes.NextGenMap:在高度多态基因组中快速准确的读取映射。
Bioinformatics. 2013 Nov 1;29(21):2790-1. doi: 10.1093/bioinformatics/btt468. Epub 2013 Aug 23.
9
SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution.SVEngine:一种高效、通用的基因组结构变异模拟器,具有癌症克隆进化特征。
Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy081.
10
PRESM: personalized reference editor for somatic mutation discovery in cancer genomics.PRESM:用于癌症基因组学中体细胞突变发现的个性化参考编辑器。
Bioinformatics. 2019 May 1;35(9):1445-1452. doi: 10.1093/bioinformatics/bty812.

引用本文的文献

1
BVSim: A benchmarking variation simulator mimicking human variation spectrum.BVSim:一种模拟人类变异谱的基准变异模拟器。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf095.
2
Generating realistic artificial human genomes using adversarial autoencoders.使用对抗自编码器生成逼真的人工人类基因组。
NAR Genom Bioinform. 2025 Jul 24;7(3):lqaf101. doi: 10.1093/nargab/lqaf101. eCollection 2025 Sep.
3
LYCEUM: learning to call copy number variants on low-coverage ancient genomes.学园:学习在低覆盖度古代基因组上识别拷贝数变异

本文引用的文献

1
COSMIC: exploring the world's knowledge of somatic mutations in human cancer.COSMIC:探索全球关于人类癌症体细胞突变的知识。
Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.
2
SMaSH: a benchmarking toolkit for human genome variant calling.SMaSH:一种用于人类基因组变异检测的基准测试工具包。
Bioinformatics. 2014 Oct;30(19):2787-95. doi: 10.1093/bioinformatics/btu345. Epub 2014 Jun 3.
3
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i285-i293. doi: 10.1093/bioinformatics/btaf244.
4
Gastric cancer genomics study using reference human pangenomes.使用参考人类泛基因组的胃癌基因组学研究。
Life Sci Alliance. 2025 Jan 27;8(4). doi: 10.26508/lsa.202402977. Print 2025 Apr.
5
DeCGR: an interactive toolkit for deciphering complex genomic rearrangements from Hi-C data.DeCGR:一种用于从 Hi-C 数据中破译复杂基因组重排的交互式工具包。
BMC Genomics. 2024 Nov 29;25(1):1152. doi: 10.1186/s12864-024-11085-5.
6
A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline.基于图的遗传变异基因分型算法在植物基因组上的综合基准测试,用于创建一个准确的综合管道。
Genome Biol. 2024 Apr 8;25(1):91. doi: 10.1186/s13059-024-03239-1.
7
Boquila: NGS read simulator to eliminate read nucleotide bias in sequence analysis.Boquila:用于消除序列分析中读取核苷酸偏差的二代测序读段模拟器。
Turk J Biol. 2023 Feb 21;47(2):158-163. doi: 10.55730/1300-0152.2650. eCollection 2023.
8
satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect.satmut_utils:用于多重变异效应分析的模拟和变异调用包。
Genome Biol. 2023 Apr 20;24(1):82. doi: 10.1186/s13059-023-02922-z.
9
CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data.CONGA:古基因组和低覆盖度测序数据中的拷贝数变异基因分型。
PLoS Comput Biol. 2022 Dec 14;18(12):e1010788. doi: 10.1371/journal.pcbi.1010788. eCollection 2022 Dec.
10
SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing.SECEDO:基于 SNV 的亚克隆检测,使用超低覆盖度单细胞 DNA 测序。
Bioinformatics. 2022 Sep 15;38(18):4293-4300. doi: 10.1093/bioinformatics/btac510.
整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。
Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.
4
The Database of Genomic Variants: a curated collection of structural variation in the human genome.基因组变异数据库:人类基因组中结构变异的精心整理集合。
Nucleic Acids Res. 2014 Jan;42(Database issue):D986-92. doi: 10.1093/nar/gkt958. Epub 2013 Oct 29.
5
RSVSim: an R/Bioconductor package for the simulation of structural variations.RSVSim:一个用于模拟结构变异的 R/Bioconductor 包。
Bioinformatics. 2013 Jul 1;29(13):1679-81. doi: 10.1093/bioinformatics/btt198. Epub 2013 Apr 25.
6
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.检测不纯和异质癌症样本中的体细胞点突变。
Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10.
7
Detecting and annotating genetic variations using the HugeSeq pipeline.使用HugeSeq流程检测和注释基因变异。
Nat Biotechnol. 2012 Mar 7;30(3):226-9. doi: 10.1038/nbt.2134.
8
ART: a next-generation sequencing read simulator.ART:一种新一代测序读模拟程序。
Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.
9
AlleleSeq: analysis of allele-specific expression and binding in a network framework.AlleleSeq:在网络框架中分析等位基因特异性表达和结合。
Mol Syst Biol. 2011 Aug 2;7:522. doi: 10.1038/msb.2011.54.
10
CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.CNVnator:一种从家族和人群基因组测序中发现、基因分型和表征典型和非典型 CNV 的方法。
Genome Res. 2011 Jun;21(6):974-84. doi: 10.1101/gr.114876.110. Epub 2011 Feb 7.