• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将 Illumina 配对末端读取用于分类系统发育标记序列。

Joining Illumina paired-end reads for classifying phylogenetic marker sequences.

机构信息

Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, 701, Taiwan.

Molecular Diagnostic Laboratory, Department of Pathology, National Cheng Kung University Hospital, Tainan, Taiwan.

出版信息

BMC Bioinformatics. 2020 Mar 14;21(1):105. doi: 10.1186/s12859-020-3445-6.

DOI:10.1186/s12859-020-3445-6
PMID:32171248
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7071698/
Abstract

BACKGROUND

Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that wastes information in the second reads. Presumably, including second reads should improve taxonomy annotation. However, a rigorous investigation of how best to do this and how much can be gained has not been reported.

RESULTS

We evaluated two methods of joining as opposed to merging PE reads into single reads for taxonomy annotation using simulated data with sequencing errors. Our rigorous evaluation involved several top classifiers (RDP classifier, SINTAX, and two alignment-based methods) and realistic benchmark datasets. For most classifiers, read joining ameliorated the impact of sequencing errors and improved the accuracy of taxonomy predictions. For alignment-based top-hit classifiers, rearranging the reference sequences is recommended to avoid improper alignments of joined reads. For word-counting classifiers, joined reads could be compared to the original reference for classification. We also applied read joining to our own real MiSeq PE data of nasal microbiota of asthmatic children. Before joining, trimming low quality bases was necessary for optimizing taxonomy annotation and sequence clustering. We then showed that read joining increased the amount of effective data for taxonomy annotation. Using these joined trimmed reads, we were able to identify two promising bacterial genera that might be associated with asthma exacerbation.

CONCLUSIONS

When mergeable PE reads are limited, joining them into single reads for taxonomy annotation is always recommended. Reference sequences may need to be rearranged accordingly depending on the classifier. Read joining also relaxes the constraint on primer selection, and thus may unleash the full capacity of Illumina PE data for taxonomy annotation. Our work provides guidance for fully utilizing PE data of a marker gene when mergeable reads are limited.

摘要

背景

Illumina 测序在宏基因组研究中很受欢迎。然而,Illumina 配对末端(PE)reads 有时无法合并为单读用于后续分析。当可合并的 PE reads 有限时,人们可以简单地仅使用第一读进行分类注释,但这会浪费第二读中的信息。推测,包括第二读应该可以改善分类注释。然而,如何最好地做到这一点以及可以获得多少收益,尚未有报道。

结果

我们使用具有测序错误的模拟数据评估了两种将 PE reads 合并为单读进行分类注释的方法。我们的严格评估涉及几种顶级分类器(RDP 分类器、SINTAX 和两种基于比对的方法)和现实的基准数据集。对于大多数分类器,读取合并改善了测序错误的影响并提高了分类预测的准确性。对于基于比对的顶级命中分类器,建议重新排列参考序列以避免合并读取的不当比对。对于基于单词计数的分类器,可以将合并的读取与原始参考进行比较以进行分类。我们还将读取合并应用于我们自己的真实 MiSeq PE 数据的哮喘儿童鼻腔微生物组。在合并之前,需要修剪低质量的碱基来优化分类注释和序列聚类。然后,我们表明,读取合并增加了分类注释的有效数据量。使用这些合并的修剪读取,我们能够鉴定出两个可能与哮喘加重相关的有前途的细菌属。

结论

当可合并的 PE reads 有限时,建议将它们合并为单读进行分类注释。根据分类器的不同,可能需要相应地重新排列参考序列。读取合并还放宽了对引物选择的限制,从而可能释放 Illumina PE 数据用于分类注释的全部容量。我们的工作为充分利用有限的可合并读时标记基因的 PE 数据提供了指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/4580b347a30b/12859_2020_3445_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/28dc792b7755/12859_2020_3445_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/25dbb1ef13a1/12859_2020_3445_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/f0afa9317c49/12859_2020_3445_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/f21cef49f6bb/12859_2020_3445_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/4580b347a30b/12859_2020_3445_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/28dc792b7755/12859_2020_3445_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/25dbb1ef13a1/12859_2020_3445_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/f0afa9317c49/12859_2020_3445_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/f21cef49f6bb/12859_2020_3445_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241b/7071698/4580b347a30b/12859_2020_3445_Fig5_HTML.jpg

相似文献

1
Joining Illumina paired-end reads for classifying phylogenetic marker sequences.将 Illumina 配对末端读取用于分类系统发育标记序列。
BMC Bioinformatics. 2020 Mar 14;21(1):105. doi: 10.1186/s12859-020-3445-6.
2
A comprehensive investigation of metagenome assembly by linked-read sequencing.基于链接读取测序的宏基因组组装综合研究。
Microbiome. 2020 Nov 11;8(1):156. doi: 10.1186/s40168-020-00929-3.
3
Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes.人类粪便宏基因组功能注释的测序前和测序后建议。
BMC Bioinformatics. 2020 Feb 24;21(1):74. doi: 10.1186/s12859-020-3416-y.
4
Accurate taxonomic assignment of short pyrosequencing reads.对短焦磷酸测序读段进行准确的分类学归属
Pac Symp Biocomput. 2010:3-9. doi: 10.1142/9789814295291_0002.
5
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.在分析低复杂度食品微生物组数据时,物种分类器的选择是一个关键考虑因素。
Microbiome. 2018 Mar 20;6(1):50. doi: 10.1186/s40168-018-0437-0.
6
CDSnake: Snakemake pipeline for retrieval of annotated OTUs from paired-end reads using CD-HIT utilities.CDSnake:使用 CD-HIT 工具从配对末端读取中检索带注释的 OTU 的 Snakemake 管道。
BMC Bioinformatics. 2020 Jul 24;21(Suppl 12):303. doi: 10.1186/s12859-020-03591-6.
7
NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors.NGmerge:通过新型经验衍生的测序错误模型合并配对末端读取。
BMC Bioinformatics. 2018 Dec 20;19(1):536. doi: 10.1186/s12859-018-2579-2.
8
Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities.拼接成对的末端读取可提高微生物群落分析中扩增子分类的分类学分类。
BMC Bioinformatics. 2021 Oct 12;22(1):493. doi: 10.1186/s12859-021-04410-2.
9
Don't let valuable microbiome data go to waste: combined usage of merging and direct-joining of sequencing reads for low-quality paired-end amplicon data.不要让有价值的微生物组数据浪费掉:将合并和测序reads 的直接连接结合使用,以处理低质量的双端扩增子数据。
Biotechnol Lett. 2024 Oct;46(5):791-805. doi: 10.1007/s10529-024-03509-9. Epub 2024 Jul 6.
10
Exploiting topic modeling to boost metagenomic reads binning.利用主题建模来促进宏基因组读数分箱。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.

引用本文的文献

1
Refining microbiome diversity analysis by concatenating and integrating dual 16S rRNA amplicon reads.通过拼接和整合双16S rRNA扩增子读数来优化微生物组多样性分析。
NPJ Biofilms Microbiomes. 2025 Apr 12;11(1):57. doi: 10.1038/s41522-025-00686-x.
2
Untrimmed ITS2 metabarcode sequences cause artificially reduced abundances of specific fungal taxa.未经修剪的ITS2元条形码序列会导致特定真菌类群的丰度人为降低。
Appl Environ Microbiol. 2025 Jan 31;91(1):e0153724. doi: 10.1128/aem.01537-24. Epub 2024 Dec 26.
3
Don't let valuable microbiome data go to waste: combined usage of merging and direct-joining of sequencing reads for low-quality paired-end amplicon data.

本文引用的文献

1
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
2
Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis.16S rRNA 基因测序在微生物组物种和菌株水平分析中的评估。
Nat Commun. 2019 Nov 6;10(1):5029. doi: 10.1038/s41467-019-13036-1.
3
Discovery and exploitation of a natural ecological trap for a mosquito disease vector.发现并利用一种天然生态陷阱控制蚊虫病媒。
不要让有价值的微生物组数据浪费掉:将合并和测序reads 的直接连接结合使用,以处理低质量的双端扩增子数据。
Biotechnol Lett. 2024 Oct;46(5):791-805. doi: 10.1007/s10529-024-03509-9. Epub 2024 Jul 6.
4
Genome-wide multi-omics analysis reveals the nutrient-dependent metabolic features of mucin-degrading gut bacteria.全基因组多组学分析揭示了黏液降解肠道细菌的营养依赖性代谢特征。
Gut Microbes. 2023 Jan-Dec;15(1):2221811. doi: 10.1080/19490976.2023.2221811.
5
Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities.拼接成对的末端读取可提高微生物群落分析中扩增子分类的分类学分类。
BMC Bioinformatics. 2021 Oct 12;22(1):493. doi: 10.1186/s12859-021-04410-2.
6
16S rRNA of Mucosal Colon Microbiome and CCL2 Circulating Levels Are Potential Biomarkers in Colorectal Cancer.黏膜结肠微生物组 16S rRNA 和 CCL2 循环水平可能是结直肠癌的生物标志物。
Int J Mol Sci. 2021 Oct 4;22(19):10747. doi: 10.3390/ijms221910747.
7
Improved high throughput protocol for targeting eukaryotic symbionts in metazoan and eDNA samples.改良的高通量方法用于靶向后生动物和 eDNA 样本中的真核共生体。
Mol Ecol Resour. 2022 Feb;22(2):664-678. doi: 10.1111/1755-0998.13509. Epub 2021 Oct 1.
8
16S rRNA Gene Amplicon Sequencing Data of Tailing and Nontailing Rhizosphere Soils of Mimosa pudica from a Heavy Metal-Contaminated Ex-Tin Mining Area.来自重金属污染的废弃锡矿区的含羞草根际土壤的尾根和非尾根16S rRNA基因扩增子测序数据。
Microbiol Resour Announc. 2020 Oct 15;9(42):e00761-20. doi: 10.1128/MRA.00761-20.
9
High-Throughput Sequencing and Unsupervised Analysis of Formyltetrahydrofolate Synthetase (FTHFS) Gene Amplicons to Estimate Acetogenic Community Structure.通过高通量测序和对甲酰四氢叶酸合成酶(FTHFS)基因扩增子进行无监督分析来估计产乙酸菌群落结构。
Front Microbiol. 2020 Aug 27;11:2066. doi: 10.3389/fmicb.2020.02066. eCollection 2020.
Proc Biol Sci. 2018 Nov 21;285(1891):20181962. doi: 10.1098/rspb.2018.1962.
4
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences.16S rRNA和真菌ITS序列分类预测的准确性。
PeerJ. 2018 Apr 18;6:e4652. doi: 10.7717/peerj.4652. eCollection 2018.
5
IL-17 and TNF-α Are Key Mediators of Triggered Exacerbation of Allergic Airway Inflammation.白细胞介素-17和肿瘤坏死因子-α是变应性气道炎症激发性加重的关键介质。
Front Immunol. 2017 Nov 14;8:1562. doi: 10.3389/fimmu.2017.01562. eCollection 2017.
6
Early-life home environment and risk of asthma among inner-city children.城市内儿童生命早期家庭环境与哮喘风险的关系。
J Allergy Clin Immunol. 2018 Apr;141(4):1468-1475. doi: 10.1016/j.jaci.2017.06.040. Epub 2017 Sep 19.
7
MeFiT: merging and filtering tool for illumina paired-end reads for 16S rRNA amplicon sequencing.MeFiT:用于16S rRNA扩增子测序的Illumina双端读数的合并与过滤工具。
BMC Bioinformatics. 2016 Dec 1;17(1):491. doi: 10.1186/s12859-016-1358-1.
8
Features of the bronchial bacterial microbiome associated with atopy, asthma, and responsiveness to inhaled corticosteroid treatment.与特应性、哮喘以及吸入性糖皮质激素治疗反应性相关的支气管细菌微生物群特征。
J Allergy Clin Immunol. 2017 Jul;140(1):63-75. doi: 10.1016/j.jaci.2016.08.055. Epub 2016 Nov 10.
9
Altered gut microbiota in female mice with persistent low body weights following removal of post-weaning chronic dietary restriction.断奶后长期饮食限制去除后体重持续偏低的雌性小鼠肠道微生物群的改变
Genome Med. 2016 Oct 3;8(1):103. doi: 10.1186/s13073-016-0357-1.
10
DADA2: High-resolution sample inference from Illumina amplicon data.DADA2:从Illumina扩增子数据进行高分辨率样本推断。
Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23.