Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
Immunol Rev. 2018 Jul;284(1):148-166. doi: 10.1111/imr.12664.
Probabilistic modeling is fundamental to the statistical analysis of complex data. In addition to forming a coherent description of the data-generating process, probabilistic models enable parameter inference about given datasets. This procedure is well developed in the Bayesian perspective, in which one infers probability distributions describing to what extent various possible parameters agree with the data. In this paper, we motivate and review probabilistic modeling for adaptive immune receptor repertoire data then describe progress and prospects for future work, from germline haplotyping to adaptive immune system deployment across tissues. The relevant quantities in immune sequence analysis include not only continuous parameters such as gene use frequency but also discrete objects such as B-cell clusters and lineages. Throughout this review, we unravel the many opportunities for probabilistic modeling in adaptive immune receptor analysis, including settings for which the Bayesian approach holds substantial promise (especially if one is optimistic about new computational methods). From our perspective, the greatest prospects for progress in probabilistic modeling for repertoires concern ancestral sequence estimation for B-cell receptor lineages, including uncertainty from germline genotype, rearrangement, and lineage development.
概率建模是复杂数据分析的基础。除了对数据生成过程形成连贯的描述外,概率模型还可以对给定数据集进行参数推断。贝叶斯观点很好地发展了这一过程,其中人们推断描述各种可能参数与数据吻合程度的概率分布。本文中,我们首先为适应性免疫受体库数据的概率建模提供了动机和综述,然后描述了从原始基因座推断到组织间适应性免疫系统部署等方面的未来工作的进展和前景。免疫序列分析中的相关数量不仅包括基因使用频率等连续参数,还包括 B 细胞簇和谱系等离散对象。在整篇综述中,我们揭示了适应性免疫受体分析中概率建模的许多机会,包括贝叶斯方法有很大前景的设置(特别是如果人们对新的计算方法持乐观态度)。从我们的角度来看,在谱系中 B 细胞受体的祖先序列估计方面,包括来自原始基因座基因型、重排和谱系发育的不确定性,在概率建模方面取得进展的最大前景。