Department of Mathematics, Bar Ilan University, Ramat Gan, Israel.
PLoS Comput Biol. 2021 Jul 26;17(7):e1009225. doi: 10.1371/journal.pcbi.1009225. eCollection 2021 Jul.
Recent advances in T cell repertoire (TCR) sequencing allow for the characterization of repertoire properties, as well as the frequency and sharing of specific TCR. However, there is no efficient measure for the local density of a given TCR. TCRs are often described either through their Complementary Determining region 3 (CDR3) sequences, or theirV/J usage, or their clone size. We here show that the local repertoire density can be estimated using a combined representation of these components through distance conserving autoencoders and Kernel Density Estimates (KDE). We present ELATE-an Encoder-based LocAl Tcr dEnsity and show that the resulting density of a sample can be used as a novel measure to study repertoire properties. The cross-density between two samples can be used as a similarity matrix to fully characterize samples from the same host. Finally, the same projection in combination with machine learning algorithms can be used to predict TCR-peptide binding through the local density of known TCRs binding a specific target.
近年来,T 细胞受体(TCR)测序技术的进步使得我们能够对受体的特征、特定 TCR 的频率和共享情况进行分析。然而,目前还没有一种有效的方法来衡量特定 TCR 的局部密度。TCR 通常通过互补决定区 3(CDR3)序列、V/J 使用情况或克隆大小来描述。在这里,我们通过距离保持自动编码器和核密度估计(KDE)的组合表示来展示如何估计局部受体密度。我们提出了基于编码器的局部 TCR 密度(ELATE),并表明可以将样本的密度作为研究受体特性的新方法。两个样本之间的交叉密度可作为相似性矩阵,用于全面描述来自同一宿主的样本。最后,通过将相同的投影与机器学习算法结合使用,可以根据已知 TCR 与特定靶标的结合的局部密度来预测 TCR-肽的结合。