Suppr超能文献

一种用于去除 T 细胞受体测序数据污染的新统计方法。

A novel statistical method for decontaminating T-cell receptor sequencing data.

机构信息

Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, 77030, Texas, Houston, USA.

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 77030, Texas, Houston, USA.

出版信息

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad230.

Abstract

The T-cell receptor (TCR) repertoire is highly diverse among the population and plays an essential role in initiating multiple immune processes. TCR sequencing (TCR-seq) has been developed to profile the T cell repertoire. Similar to other high-throughput experiments, contamination can happen during several steps of TCR-seq, including sample collection, preparation and sequencing. Such contamination creates artifacts in the data, leading to inaccurate or even biased results. Most existing methods assume 'clean' TCR-seq data as the starting point with no ability to handle data contamination. Here, we develop a novel statistical model to systematically detect and remove contamination in TCR-seq data. We summarize the observed contamination into two sources, pairwise and cross-cohort. For both sources, we provide visualizations and summary statistics to help users assess the severity of the contamination. Incorporating prior information from 14 existing TCR-seq datasets with minimum contamination, we develop a straightforward Bayesian model to statistically identify contaminated samples. We further provide strategies for removing the impacted sequences to allow for downstream analysis, thus avoiding any need to repeat experiments. Our proposed model shows robustness in contamination detection compared with a few off-the-shelf detection methods in simulation studies. We illustrate the use of our proposed method on two TCR-seq datasets generated locally.

摘要

T 细胞受体 (TCR) 库在人群中高度多样化,在启动多种免疫过程中发挥着重要作用。TCR 测序 (TCR-seq) 已被开发用于分析 T 细胞库。与其他高通量实验类似,TCR-seq 的多个步骤都可能发生污染,包括样本采集、准备和测序。这种污染会在数据中产生伪影,导致结果不准确甚至有偏差。大多数现有方法都假设“干净”的 TCR-seq 数据作为起点,无法处理数据污染。在这里,我们开发了一种新的统计模型来系统地检测和去除 TCR-seq 数据中的污染。我们将观察到的污染总结为两种来源,即成对和跨队列。对于这两种来源,我们提供可视化和汇总统计信息,以帮助用户评估污染的严重程度。我们结合了来自 14 个具有最小污染的现有 TCR-seq 数据集的先验信息,开发了一种简单的贝叶斯模型来从统计学上识别污染样本。我们进一步提供了去除受影响序列的策略,以便进行下游分析,从而避免重复实验的需要。与模拟研究中的几种现成检测方法相比,我们提出的模型在污染检测方面表现出稳健性。我们在本地生成的两个 TCR-seq 数据集上说明了我们提出的方法的使用。

相似文献

本文引用的文献

10
Radiotherapy induces responses of lung cancer to CTLA-4 blockade.放疗诱导肺癌对 CTLA-4 阻断的反应。
Nat Med. 2018 Dec;24(12):1845-1851. doi: 10.1038/s41591-018-0232-2. Epub 2018 Nov 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验