• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

根据经验性突变和测序模型模拟下一代测序数据集。

Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models.

作者信息

Stephens Zachary D, Hudson Matthew E, Mainzer Liudmila S, Taschuk Morgan, Weber Matthew R, Iyer Ravishankar K

机构信息

Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America.

Department of Crop Sciences, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America.

出版信息

PLoS One. 2016 Nov 28;11(11):e0167047. doi: 10.1371/journal.pone.0167047. eCollection 2016.

DOI:10.1371/journal.pone.0167047
PMID:27893777
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5125660/
Abstract

An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the "ground truth" about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads.

摘要

验证和基准测试基因组分析方法的一个障碍是,几乎没有可用的参考数据集,对于这些数据集,样本基因组突变图谱的“真实情况”是已知的且经过充分验证。此外,真实人类基因组数据集的免费公开可用性与保护捐赠者隐私不兼容。为了更好地分析和理解基因组数据,我们需要能够模拟所有变异的测试数据集,这些变异既能反映已知生物学特征,又能体现测序假象。读取模拟器可以满足这一要求,但常因与真实数据相似度有限以及整体灵活性不足而受到批评。我们展示了NEAT(下一代测序分析工具包),这是一组工具,不仅包括一个易于使用的读取模拟器,还包括便于变异比较和工具评估的脚本。NEAT有各种各样的可调参数,可以在默认模型上手动设置,也可以使用真实数据集进行参数化。该软件可在github.com/zstephens/neat-genreads上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/247bd625050f/pone.0167047.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/6a262da80672/pone.0167047.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/7a26cdc33d30/pone.0167047.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/31ee171c18cd/pone.0167047.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/b1ce7dc88d1a/pone.0167047.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/17bfc89ac6a4/pone.0167047.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/f27a06cd10b9/pone.0167047.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/fdb1adffaebb/pone.0167047.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/4ab8fcdfec8a/pone.0167047.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/c3a1207bb42e/pone.0167047.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/2ba4ddb8840e/pone.0167047.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/247bd625050f/pone.0167047.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/6a262da80672/pone.0167047.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/7a26cdc33d30/pone.0167047.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/31ee171c18cd/pone.0167047.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/b1ce7dc88d1a/pone.0167047.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/17bfc89ac6a4/pone.0167047.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/f27a06cd10b9/pone.0167047.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/fdb1adffaebb/pone.0167047.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/4ab8fcdfec8a/pone.0167047.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/c3a1207bb42e/pone.0167047.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/2ba4ddb8840e/pone.0167047.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f361/5125660/247bd625050f/pone.0167047.g011.jpg

相似文献

1
Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models.根据经验性突变和测序模型模拟下一代测序数据集。
PLoS One. 2016 Nov 28;11(11):e0167047. doi: 10.1371/journal.pone.0167047. eCollection 2016.
2
IgSimulator: a versatile immunosequencing simulator.IgSimulator:一种通用的免疫测序模拟程序。
Bioinformatics. 2015 Oct 1;31(19):3213-5. doi: 10.1093/bioinformatics/btv326. Epub 2015 May 25.
3
A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.基于k谱的下一代测序数据分析纠错方法的比较研究。
Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.
4
NEAT: a framework for building fully automated NGS pipelines and analyses.NEAT:一个用于构建全自动二代测序流程及分析的框架。
BMC Bioinformatics. 2016 Feb 1;17:53. doi: 10.1186/s12859-016-0902-3.
5
Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking.综合方法生成低肿瘤分数的人工样本用于体细胞变异calling 基准测试。
BMC Bioinformatics. 2024 May 8;25(1):180. doi: 10.1186/s12859-024-05793-8.
6
SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution.SVEngine:一种高效、通用的基因组结构变异模拟器,具有癌症克隆进化特征。
Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy081.
7
SMaSH: a benchmarking toolkit for human genome variant calling.SMaSH:一种用于人类基因组变异检测的基准测试工具包。
Bioinformatics. 2014 Oct;30(19):2787-95. doi: 10.1093/bioinformatics/btu345. Epub 2014 Jun 3.
8
jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator.狼兔:一种快速、通用的系统发育基因组学和高通量测序模拟程序。
Mol Ecol Resour. 2020 Jul;20(4):1132-1140. doi: 10.1111/1755-0998.13173. Epub 2020 May 20.
9
More Pitfalls Related to Next-generation Sequencing (NGS).与下一代测序(NGS)相关的更多陷阱。
Am J Clin Oncol. 2016 Aug;39(4):424. doi: 10.1097/COC.0000000000000280.
10
Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.

引用本文的文献

1
Sequencing airborne DNA to monitor crop pathogens and pests.对空气中的DNA进行测序以监测作物病原体和害虫。
iScience. 2025 Jun 16;28(7):112912. doi: 10.1016/j.isci.2025.112912. eCollection 2025 Jul 18.
2
GENOMICON-Seq enables realistic simulation of amplicon and exome sequencing for low-frequency mutation detection.GENOMICON-Seq能够对扩增子和外显子测序进行逼真模拟,以检测低频突变。
Sci Rep. 2025 Jul 2;15(1):23003. doi: 10.1038/s41598-025-05267-8.
3
Flexible, production-scale, human whole genome sequencing on a benchtop sequencer.

本文引用的文献

1
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
2
RNF: a general framework to evaluate NGS read mappers.RNF:一种评估二代测序读段比对工具的通用框架。
Bioinformatics. 2016 Jan 1;32(1):136-9. doi: 10.1093/bioinformatics/btv524. Epub 2015 Sep 9.
3
Non-coding recurrent mutations in chronic lymphocytic leukaemia.慢性淋巴细胞白血病中的非编码重现性突变。
在台式测序仪上进行灵活的、生产规模的人类全基因组测序。
BMC Genomics. 2025 Jun 4;26(1):559. doi: 10.1186/s12864-025-11741-4.
4
Overcoming limitations to customize DeepVariant for domesticated animals with TrioTrain.利用TrioTrain克服限制以定制适用于家养动物的DeepVariant。
Genome Res. 2025 Aug 1;35(8):1859-1874. doi: 10.1101/gr.279542.124.
5
MeSS and assembly_finder: a toolkit for in silico metagenomic sample generation.MeSS和assembly_finder:一个用于计算机模拟宏基因组样本生成的工具包。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae760.
6
What can we infer about mutation calling by using time-series mutation accumulation data and a Bayesian Mutation Finder?通过使用时间序列突变积累数据和贝叶斯突变发现器,我们对突变检测能推断出什么?
Ecol Evol. 2024 Nov 10;14(11):e70339. doi: 10.1002/ece3.70339. eCollection 2024 Nov.
7
A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets.一种基于快速系统发育的方法,可准确描绘大规模 metabarcoding 数据集的群落组成。
Elife. 2024 Aug 15;13:e85794. doi: 10.7554/eLife.85794.
8
Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking.综合方法生成低肿瘤分数的人工样本用于体细胞变异calling 基准测试。
BMC Bioinformatics. 2024 May 8;25(1):180. doi: 10.1186/s12859-024-05793-8.
9
Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain.利用TrioTrain克服家畜深度学习的局限性。
bioRxiv. 2024 Apr 20:2024.04.15.589602. doi: 10.1101/2024.04.15.589602.
10
Improved quality metrics for association and reproducibility in chromatin accessibility data using mutual information.利用互信息提高染色质可及性数据关联和可重复性的质量指标。
BMC Bioinformatics. 2023 Nov 22;24(1):441. doi: 10.1186/s12859-023-05553-0.
Nature. 2015 Oct 22;526(7574):519-24. doi: 10.1038/nature14666. Epub 2015 Jul 22.
4
VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.VarSim:一个用于癌症相关高通量基因组测序的高保真模拟与验证框架。
Bioinformatics. 2015 May 1;31(9):1469-71. doi: 10.1093/bioinformatics/btu828. Epub 2014 Dec 17.
5
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.高通量测序中使用的映射算法比较:应用于Ion Torrent数据
BMC Genomics. 2014 Apr 5;15:264. doi: 10.1186/1471-2164-15-264.
6
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。
Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.
7
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.SInC:一种准确且快速的基于错误模型的 SNP、Indel 和 CNV 模拟器,结合了用于短读序列数据的读取生成器。
BMC Bioinformatics. 2014 Feb 5;15:40. doi: 10.1186/1471-2105-15-40.
8
Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair.癌症基因组调控DNA中局部突变密度降低与DNA修复有关。
Nat Biotechnol. 2014 Jan;32(1):71-5. doi: 10.1038/nbt.2778. Epub 2013 Dec 15.
9
Wessim: a whole-exome sequencing simulator based on in silico exome capture.Wessim:基于计算机模拟外显子捕获的全外显子组测序模拟工具。
Bioinformatics. 2013 Apr 15;29(8):1076-7. doi: 10.1093/bioinformatics/btt074. Epub 2013 Feb 14.
10
GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE:ENCODE 项目的人类参考基因组注释。
Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.