• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个生物信息学平台,用于整合不同读深的目标捕获和全基因组序列进行系统发生基因组学研究。

A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics.

机构信息

Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic.

Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic.

出版信息

Mol Ecol. 2021 Dec;30(23):6021-6035. doi: 10.1111/mec.16240. Epub 2021 Oct 31.

DOI:10.1111/mec.16240
PMID:34674330
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9298010/
Abstract

The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.

摘要

短读全基因组测序 (WGS) 的日益普及为研究生态和进化过程提供了前所未有的机会。虽然可以从 WGS 数据中提取感兴趣的基因座,并将其与目标序列数据相结合,但这需要合适的生物信息学工作流程。在这里,我们测试了不同的组装和基因座提取策略,并将其实现到 secapr 中,这是一个将短读数据处理成用于系统发育和分子生态学分析的多位点对齐的管道。我们将低覆盖率 WGS(<30×)和目标序列捕获的数据处理集成到一个灵活的框架中,同时优化从头组装和基因座提取。具体来说,我们通过对比它们从目标蝴蝶蛋白编码基因中提取基因座的能力来测试不同的组装策略,使用了四个数据集:一个跨越不同平均覆盖率 (10×、5×和 2×) 的 WGS 数据集,以及一个在测序前通过目标序列捕获富集这些基因座的数据集。使用生成的从头组装的 contigs,我们考虑了 contigs 内部的潜在错误,并推断了系统发育树,以评估每种组装策略恢复物种关系的能力。我们证明,同时选择多个 kmer 大小用于组装可以从从头组装的 contigs 中获得最高的提取基因座产量,而源自测序读深低至 5×的数据集可以在系统发育树中恢复预期的物种关系。通过在 secapr 管道中提供经过测试的组装方法,我们希望激发未来的研究,纳入补充数据,并在最佳组装策略上做出明智的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/c926806ada92/MEC-30-6021-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/86ca1b2fd64c/MEC-30-6021-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/54b128ac5282/MEC-30-6021-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/f2c9cf652694/MEC-30-6021-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/1e83a5c09e6a/MEC-30-6021-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/073fc4678569/MEC-30-6021-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/c926806ada92/MEC-30-6021-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/86ca1b2fd64c/MEC-30-6021-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/54b128ac5282/MEC-30-6021-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/f2c9cf652694/MEC-30-6021-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/1e83a5c09e6a/MEC-30-6021-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/073fc4678569/MEC-30-6021-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffd6/9298010/c926806ada92/MEC-30-6021-g002.jpg

相似文献

1
A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics.一个生物信息学平台,用于整合不同读深的目标捕获和全基因组序列进行系统发生基因组学研究。
Mol Ecol. 2021 Dec;30(23):6021-6035. doi: 10.1111/mec.16240. Epub 2021 Oct 31.
2
A novel assembly pipeline and functional annotations for targeted sequencing: A case study on the globally threatened Margaritiferidae (Bivalvia: Unionida).一种新型的靶向测序组装流水线和功能注释方法:以全球受威胁的贻贝科(双壳纲:珠蚌目)为例。
Mol Ecol Resour. 2023 Aug;23(6):1403-1422. doi: 10.1111/1755-0998.13802. Epub 2023 Apr 24.
3
Resolving the phylogeny of Thladiantha (Cucurbitaceae) with three different target capture pipelines.利用三种不同的目标捕获管道解析栝楼(葫芦科)的系统发育。
BMC Ecol Evol. 2023 Dec 12;23(1):75. doi: 10.1186/s12862-023-02185-z.
4
Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展
PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.
5
SECAPR-a bioinformatics pipeline for the rapid and user-friendly processing of targeted enriched Illumina sequences, from raw reads to alignments.SECAPR——一种生物信息学流程,用于从原始读数到比对,快速且用户友好地处理靶向富集的Illumina序列。
PeerJ. 2018 Jul 13;6:e5175. doi: 10.7717/peerj.5175. eCollection 2018.
6
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化,及其对多样性计算的影响。
Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.
7
Comparative performance of transcriptome assembly methods for non-model organisms.非模式生物转录组组装方法的比较性能
BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.
8
Phylogenomic inferences from reference-mapped and de novo assembled short-read sequence data using RADseq sequencing of California white oaks (Quercus section Quercus).基于 RADseq 测序的加利福尼亚白橡树(栎属栎亚属)参考映射和从头组装短读序列数据的系统发育基因组推断。
Genome. 2017 Sep;60(9):743-755. doi: 10.1139/gen-2016-0202. Epub 2017 Mar 29.
9
De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing.利用 Oxford Nanopore 技术和 Illumina 测序对印度蓝孔雀(Pavo cristatus)基因组进行从头组装。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz038.
10
Allele Phasing Greatly Improves the Phylogenetic Utility of Ultraconserved Elements.等位基因定相极大地提高了超保守元件的系统发育实用性。
Syst Biol. 2019 Jan 1;68(1):32-46. doi: 10.1093/sysbio/syy039.

引用本文的文献

1
Pervasive horizontal transmission of Wolbachia in natural populations of closely related and widespread tropical skipper butterflies.沃尔巴克氏体在近缘且分布广泛的热带弄蝶自然种群中的普遍水平传播。
BMC Microbiol. 2025 Jan 7;25(1):5. doi: 10.1186/s12866-024-03719-1.
2
Genomic Diversity, Antimicrobial Resistance, Plasmidome, and Virulence Profiles of Isolated from Small Specialty Crop Farms Revealed by Whole-Genome Sequencing.通过全基因组测序揭示的来自小型特色作物农场分离株的基因组多样性、抗菌耐药性、质粒组和毒力谱
Antibiotics (Basel). 2023 Nov 18;12(11):1637. doi: 10.3390/antibiotics12111637.
3
UPrimer: A Clade-Specific Primer Design Program Based on Nested-PCR Strategy and Its Applications in Amplicon Capture Phylogenomics.

本文引用的文献

1
A beginner's guide to low-coverage whole genome sequencing for population genomics.人群基因组学低覆盖度全基因组测序入门指南。
Mol Ecol. 2021 Dec;30(23):5966-5993. doi: 10.1111/mec.16077. Epub 2021 Aug 31.
2
Efficient phasing and imputation of low-coverage sequencing data using large reference panels.利用大型参考面板实现低覆盖度测序数据的高效相位推断和插补。
Nat Genet. 2021 Jan;53(1):120-126. doi: 10.1038/s41588-020-00756-0. Epub 2021 Jan 7.
3
Community-led, integrated, reproducible multi-omics with anvi'o.社区主导的、集成的、可重复的多组学分析,使用 anvi'o 软件。
UPrimer:一种基于巢式 PCR 策略的分支特异性引物设计程序及其在扩增子捕获系统发育基因组学中的应用。
Mol Biol Evol. 2023 Nov 3;40(11). doi: 10.1093/molbev/msad230.
Nat Microbiol. 2021 Jan;6(1):3-6. doi: 10.1038/s41564-020-00834-3.
4
MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics.MitoFinder:目标富集系统发育基因组学中高效自动化的大规模线粒体基因组数据提取。
Mol Ecol Resour. 2020 Jul;20(4):892-905. doi: 10.1111/1755-0998.13160. Epub 2020 Apr 25.
5
A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project.开展系统发育基因组目标序列捕获项目指南。
Front Genet. 2020 Feb 21;10:1407. doi: 10.3389/fgene.2019.01407. eCollection 2019.
6
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.
7
Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach.低覆盖度全基因组测序在杂交系中进行基因分型:一种具有成本效益的方法。
Genet Sel Evol. 2019 Aug 14;51(1):44. doi: 10.1186/s12711-019-0487-1.
8
SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data.SeqTailor:一个用户友好的网络服务器,用于从下一代测序数据中提取 DNA 或蛋白质序列。
Nucleic Acids Res. 2019 Jul 2;47(W1):W623-W631. doi: 10.1093/nar/gkz326.
9
Phylogenomics using low-depth whole genome sequencing: A case study with the olive tribe.基于低深度全基因组测序的系统发育基因组学:以橄榄族为例的研究
Mol Ecol Resour. 2019 Jul;19(4):877-892. doi: 10.1111/1755-0998.13016. Epub 2019 May 13.
10
Genomes of skipper butterflies reveal extensive convergence of wing patterns.阔翅蛱蝶基因组揭示了翅膀图案的广泛趋同进化。
Proc Natl Acad Sci U S A. 2019 Mar 26;116(13):6232-6237. doi: 10.1073/pnas.1821304116. Epub 2019 Mar 15.