• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NGSphy:下一代测序数据的系统发育模拟。

NGSphy: phylogenomic simulation of next-generation sequencing data.

机构信息

Department of Biochemistry, Genetics and Immunology.

Biomedical Research Center (CINBIO), University of Vigo Vigo, Spain.

出版信息

Bioinformatics. 2018 Jul 15;34(14):2506-2507. doi: 10.1093/bioinformatics/bty146.

DOI:10.1093/bioinformatics/bty146
PMID:29534152
Abstract

MOTIVATION

Advances in sequencing technologies have made it feasible to obtain massive datasets for phylogenomic inference, often consisting of large numbers of loci from multiple species and individuals. The phylogenomic analysis of next-generation sequencing (NGS) data requires a complex computational pipeline where multiple technical and methodological decisions are necessary that can influence the final tree obtained, like those related to coverage, assembly, mapping, variant calling and/or phasing.

RESULTS

To assess the influence of these variables we introduce NGSphy, an open-source tool for the simulation of Illumina reads/read counts obtained from haploid/diploid individual genomes with thousands of independent gene families evolving under a common species tree. In order to resemble real NGS experiments, NGSphy includes multiple options to model sequencing coverage (depth) heterogeneity across species, individuals and loci, including off-target or uncaptured loci. For comprehensive simulations covering multiple evolutionary scenarios, parameter values for the different replicates can be sampled from user-defined statistical distributions.

AVAILABILITY AND IMPLEMENTATION

Source code, full documentation and tutorials including a 'Getting started' guide are available at http://github.com/merlyescalona/ngsphy.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

测序技术的进步使得对系统发育推断进行大规模数据集的获取成为可能,这些数据集通常由来自多个物种和个体的大量基因座组成。下一代测序(NGS)数据的系统发育分析需要一个复杂的计算流程,其中需要做出多个技术和方法学决策,这些决策可能会影响最终获得的树,例如与覆盖度、组装、映射、变异调用和/或相位有关的决策。

结果

为了评估这些变量的影响,我们引入了 NGSphy,这是一种开源工具,用于模拟 Illumina 读取/读取计数,这些读取计数来自具有数千个独立基因家族的单倍体/二倍体个体基因组,这些基因家族在共同的物种树下进化。为了模拟真实的 NGS 实验,NGSphy 包括多种选项来模拟跨物种、个体和基因座的测序覆盖度(深度)异质性,包括非靶标或未捕获的基因座。对于涵盖多个进化场景的综合模拟,可以从用户定义的统计分布中对不同重复的参数值进行抽样。

可用性和实现

源代码、完整文档和教程(包括“入门指南”)可在 http://github.com/merlyescalona/ngsphy 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
NGSphy: phylogenomic simulation of next-generation sequencing data.NGSphy:下一代测序数据的系统发育模拟。
Bioinformatics. 2018 Jul 15;34(14):2506-2507. doi: 10.1093/bioinformatics/bty146.
2
pIRS: Profile-based Illumina pair-end reads simulator.pIRS:基于谱的 Illumina 双端读取模拟器。
Bioinformatics. 2012 Jun 1;28(11):1533-5. doi: 10.1093/bioinformatics/bts187. Epub 2012 Apr 15.
3
jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator.狼兔:一种快速、通用的系统发育基因组学和高通量测序模拟程序。
Mol Ecol Resour. 2020 Jul;20(4):1132-1140. doi: 10.1111/1755-0998.13173. Epub 2020 May 20.
4
VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.VarSim:一个用于癌症相关高通量基因组测序的高保真模拟与验证框架。
Bioinformatics. 2015 May 1;31(9):1469-71. doi: 10.1093/bioinformatics/btu828. Epub 2014 Dec 17.
5
One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies.一刀切并不适用——RefEditor:构建个性化二倍体参考基因组以改善下一代测序研究中的读段映射和基因型调用
PLoS Comput Biol. 2015 Aug 12;11(8):e1004448. doi: 10.1371/journal.pcbi.1004448. eCollection 2015 Aug.
6
ARCS: scaffolding genome drafts with linked reads.ARCS:使用链接读取构建基因组草图。
Bioinformatics. 2018 Mar 1;34(5):725-731. doi: 10.1093/bioinformatics/btx675.
7
scanPAV: a pipeline for extracting presence-absence variations in genome pairs.scanPAV:用于提取基因组对中存在-缺失变异的管道。
Bioinformatics. 2018 Sep 1;34(17):3022-3024. doi: 10.1093/bioinformatics/bty189.
8
Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies.基于群体的变异检测,利用下一代超级计算技术进行大规模全基因组测序研究。
BMC Bioinformatics. 2015 Sep 22;16(1):304. doi: 10.1186/s12859-015-0736-4.
9
aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data.aTRAM - 自动目标受限组装方法:一种利用下一代测序数据在不同分类群中组装基因座的快速方法。
BMC Bioinformatics. 2015 Mar 25;16(1):98. doi: 10.1186/s12859-015-0515-2.
10
MsPAC: a tool for haplotype-phased structural variant detection.MsPAC:一种用于单体型相位结构变异检测的工具。
Bioinformatics. 2020 Feb 1;36(3):922-924. doi: 10.1093/bioinformatics/btz618.

引用本文的文献

1
AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data.AlcoR:生物数据中低复杂度区域的无比对模拟、映射和可视化。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad101. Epub 2023 Dec 13.
2
Benchmarking the topological accuracy of bacterial phylogenomic workflows using evolution.使用进化基准测试细菌基因组系统发生工作流程的拓扑准确性。
Microb Genom. 2022 Mar;8(3). doi: 10.1099/mgen.0.000799.
3
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.
从扩增子序列中挑选操作分类单元的方法比较
Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021.
4
A broad survey of DNA sequence data simulation tools.DNA 序列数据模拟工具的广泛调查。
Brief Funct Genomics. 2020 Jan 22;19(1):49-59. doi: 10.1093/bfgp/elz033.