BC Cancer Agency, Michael Smith Genome Sciences Centre, Vancouver, British Columbia V5Z 1L3, Canada.
Genome Res. 2011 May;21(5):790-7. doi: 10.1101/gr.115428.110. Epub 2011 Feb 24.
Massively parallel sequencing is a useful approach for characterizing T-cell receptor diversity. However, immune receptors are extraordinarily difficult sequencing targets because any given receptor variant may be present in very low abundance and may differ legitimately by only a single nucleotide. We show that the sensitivity of sequence-based repertoire profiling is limited by both sequencing depth and sequencing accuracy. At two timepoints, 1 wk apart, we isolated bulk PBMC plus naïve (CD45RA+/CD45RO-) and memory (CD45RA-/CD45RO+) T-cell subsets from a healthy donor. From T-cell receptor beta chain (TCRB) mRNA we constructed and sequenced multiple libraries to obtain a total of 1.7 billion paired sequence reads. The sequencing error rate was determined empirically and used to inform a high stringency data filtering procedure. The error filtered data yielded 1,061,522 distinct TCRB nucleotide sequences from this subject which establishes a new, directly measured, lower limit on individual T-cell repertoire size and provides a useful reference set of sequences for repertoire analysis. TCRB nucleotide sequences obtained from two additional donors were compared to those from the first donor and revealed limited sharing (up to 1.1%) of nucleotide sequences among donors, but substantially higher sharing (up to 14.2%) of inferred amino acid sequences. For each donor, shared amino acid sequences were encoded by a much larger diversity of nucleotide sequences than were unshared amino acid sequences. We also observed a highly statistically significant association between numbers of shared sequences and shared HLA class I alleles.
大规模平行测序是一种用于描述 T 细胞受体多样性的有效方法。然而,免疫受体是非常难以测序的目标,因为任何给定的受体变体可能存在极低的丰度,并且仅通过单个核苷酸差异合法。我们表明,基于序列的库分析的敏感性受到测序深度和测序准确性的限制。在相隔一周的两个时间点,我们从健康供体中分离出批量 PBMC 加幼稚(CD45RA+/CD45RO-)和记忆(CD45RA-/CD45RO+)T 细胞亚群。从 T 细胞受体β链(TCRB)mRNA 构建并测序了多个文库,总共获得了 17 亿对配对序列读取。通过经验确定测序错误率,并将其用于通知严格的数据过滤过程。经过错误过滤的数据从该个体中获得了 1,061,522 个独特的 TCRB 核苷酸序列,这建立了个体 T 细胞库大小的新的、直接测量的下限,并为库分析提供了有用的序列参考集。从另外两个供体获得的 TCRB 核苷酸序列与第一个供体的序列进行比较,结果显示供体之间核苷酸序列的共享(高达 1.1%)有限,但推断的氨基酸序列的共享(高达 14.2%)要高得多。对于每个供体,共享的氨基酸序列由比非共享的氨基酸序列更多样化的核苷酸序列编码。我们还观察到共享序列的数量与共享 HLA 类 I 等位基因之间存在高度统计学显著的关联。