• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

两半之和可能与跨道分割测序样本的整体效果不同。

The Sum of Two Halves May Be Different from the Whole-Effects of Splitting Sequencing Samples Across Lanes.

机构信息

Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK.

Life Sciences-Transcriptomics and Functional Genomics Lab, Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain.

出版信息

Genes (Basel). 2022 Dec 1;13(12):2265. doi: 10.3390/genes13122265.

DOI:10.3390/genes13122265
PMID:36553532
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9777937/
Abstract

The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks' properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues.

摘要

高通量测序(HTS)的进步使我们能够以前所未有的详细程度来描述生物过程;分子生物学中的大多数假说都依赖于 HTS 数据的分析。然而,实现结果的更高稳健性和可重复性仍然是一个主要挑战。尽管结果的可变性可能在不同的阶段引入,例如对齐、汇总或差异表达的检测,但一个来源的可变性被系统地忽略了:测序设计,它通过分析传播,可能会引入额外的技术变化层。我们说明了在批量和单细胞测序中跨泳道拆分样本所产生的定性和定量差异。对于批量 mRNAseq 数据,我们专注于差异表达和富集分析;对于批量 ChIPseq 数据,我们研究了它对峰调用和峰的性质的影响。在单细胞水平上,我们专注于识别细胞亚群。我们依赖于用于分配细胞身份的标记物;介绍了 smartSeq 和 10× 数据。可测序片段数量的减少限制了不同预测方法所依赖的详细程度。此外,测序随机性增加了加权偏差,这与可变的测序深度和(尚未解释)测序偏差相符。随后,我们观察到跨技术、实验背景、生物体和组织的测序复杂性总体降低,生物信号失真。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/a4a6bd93eb91/genes-13-02265-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/02add118c5a2/genes-13-02265-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/08780f9f15ac/genes-13-02265-g002a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/f5f139eb17a2/genes-13-02265-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/a4a6bd93eb91/genes-13-02265-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/02add118c5a2/genes-13-02265-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/08780f9f15ac/genes-13-02265-g002a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/f5f139eb17a2/genes-13-02265-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73c5/9777937/a4a6bd93eb91/genes-13-02265-g004.jpg

相似文献

1
The Sum of Two Halves May Be Different from the Whole-Effects of Splitting Sequencing Samples Across Lanes.两半之和可能与跨道分割测序样本的整体效果不同。
Genes (Basel). 2022 Dec 1;13(12):2265. doi: 10.3390/genes13122265.
2
A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae).一种基于PCR靶向富集和高通量测序的系统发育基因组学方法:解析南美洲齿叶草属(列当科)物种的多样性
PLoS One. 2016 Feb 1;11(2):e0148203. doi: 10.1371/journal.pone.0148203. eCollection 2016.
3
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序:一种合成方法。
Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.
4
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
5
A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.一种新型的超高通量 16S rRNA 基因扩增子测序文库制备方法,适用于 Illumina HiSeq 平台。
Microbiome. 2017 Jul 6;5(1):68. doi: 10.1186/s40168-017-0279-1.
6
Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data.临床型靶向测序数据中体细胞点突变calling 的准确性和可重复性。
BMC Med Genomics. 2020 Oct 15;13(1):156. doi: 10.1186/s12920-020-00803-z.
7
Analysis options for high-throughput sequencing in miRNA expression profiling.miRNA表达谱中高通量测序的分析选项
BMC Res Notes. 2014 Mar 13;7:144. doi: 10.1186/1756-0500-7-144.
8
Targeted enrichment of the black cottonwood (Populus trichocarpa) gene space using sequence capture.利用序列捕获技术对黑棉白杨(Populus trichocarpa)基因空间进行靶向富集。
BMC Genomics. 2012 Dec 14;13:703. doi: 10.1186/1471-2164-13-703.
9
Random Tagging Genotyping by Sequencing (rtGBS), an Unbiased Approach to Locate Restriction Enzyme Sites across the Target Genome.通过测序进行随机标签基因分型(rtGBS),一种在目标基因组中定位限制酶切位点的无偏差方法。
PLoS One. 2015 Dec 3;10(12):e0143193. doi: 10.1371/journal.pone.0143193. eCollection 2015.
10
reactIDR: evaluation of the statistical reproducibility of high-throughput structural analyses towards a robust RNA structure prediction.reactIDR:高通量结构分析的统计重现性评估,旨在实现稳健的 RNA 结构预测。
BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):130. doi: 10.1186/s12859-019-2645-4.

本文引用的文献

1
POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching.POIBM:通过潜在样本匹配对异质RNA测序数据集进行批量校正。
Bioinformatics. 2022 Apr 28;38(9):2474-2480. doi: 10.1093/bioinformatics/btac124.
2
noisyR: enhancing biological signal in sequencing datasets by characterizing random technical noise.noisyR:通过对随机技术噪声进行特征化来增强测序数据集的生物信号。
Nucleic Acids Res. 2021 Aug 20;49(14):e83. doi: 10.1093/nar/gkab433.
3
A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics.
联合深度学习模型可实现单细胞转录组学中批量效应校正、去噪和聚类的同时进行。
Genome Res. 2021 Oct;31(10):1753-1766. doi: 10.1101/gr.271874.120. Epub 2021 May 25.
4
Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench.使用 BatchBench 灵活比较单细胞 RNA-seq 的批量校正方法。
Nucleic Acids Res. 2021 Apr 19;49(7):e42. doi: 10.1093/nar/gkab004.
5
: batch effect adjustment for RNA-seq count data.RNA测序计数数据的批次效应调整
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078. doi: 10.1093/nargab/lqaa078. Epub 2020 Sep 21.
6
Alignment and mapping methodology influence transcript abundance estimation.比对和映射方法会影响转录本丰度的估计。
Genome Biol. 2020 Sep 7;21(1):239. doi: 10.1186/s13059-020-02151-8.
7
The road ahead in genetics and genomics.遗传学和基因组学的未来之路。
Nat Rev Genet. 2020 Oct;21(10):581-596. doi: 10.1038/s41576-020-0272-6. Epub 2020 Aug 24.
8
scBatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment.scBatch:通过样本距离矩阵调整对 RNA-seq 数据进行批次效应校正。
Bioinformatics. 2020 May 1;36(10):3115-3123. doi: 10.1093/bioinformatics/btaa097.
9
Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression.诱导多能干细胞分化过程中的单细胞 RNA 测序揭示了基因表达的动态遗传效应。
Nat Commun. 2020 Feb 10;11(1):810. doi: 10.1038/s41467-020-14457-z.
10
A benchmark of batch-effect correction methods for single-cell RNA sequencing data.单细胞 RNA 测序数据批次效应校正方法的基准测试。
Genome Biol. 2020 Jan 16;21(1):12. doi: 10.1186/s13059-019-1850-9.