Cinelli Mattia, Sun Yuxin, Best Katharine, Heather James M, Reich-Zeliger Shlomit, Shifrut Eric, Friedman Nir, Shawe-Taylor John, Chain Benny
Division of Infection and Immunity.
Department of Computer Science.
Bioinformatics. 2017 Apr 1;33(7):951-955. doi: 10.1093/bioinformatics/btw771.
Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund's adjuvant (CFA) or CFA alone.
The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases.
The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund's Adjuvant.
The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893 . The Decombinator package is available at github.com/innate2adaptive/Decombinator . The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html .
Supplementary data are available at Bioinformatics online.
体细胞DNA重组是脊椎动物适应性免疫的标志,有潜力产生大量多样的抗原受体序列。这种多样性如何捕捉抗原特异性仍未完全理解。在本研究中,我们使用高通量测序来比较用完全弗氏佐剂(CFA)或单独的CFA免疫卵清蛋白后T细胞受体β链互补决定区3(CDR3β)序列的全局变化。
CDR3β序列被解构为短的重叠连续氨基酸片段。根据一维贝叶斯分类器分数对基序进行排序,该分数比较了它们在两种免疫类别库中的频率。选择排名靠前的基序并用于创建特征向量,这些特征向量用于训练支持向量机。在留一法验证测试中,支持向量机获得了较高的分类分数,在某些情况下达到>90%。
该研究描述了一种将一维贝叶斯分类器与支持向量机相结合的新型两阶段分类策略。使用这种方法,我们证明了少数长度为三个氨基酸的线性基序的频率可以在针对完全弗氏佐剂所特有的复杂抗原混合物的背景反应中准确识别对卵清蛋白的CD4 T细胞反应。
补充数据可在《生物信息学》在线获取。