Suppr超能文献

VDJ重排和替换参数的一致性可实现准确的B细胞受体序列注释。

Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation.

作者信息

Ralph Duncan K, Matsen Frederick A

机构信息

Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.

出版信息

PLoS Comput Biol. 2016 Jan 11;12(1):e1004409. doi: 10.1371/journal.pcbi.1004409. eCollection 2016 Jan.

Abstract

VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM "factorization" strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.

摘要

VDJ重排和体细胞超突变共同作用,产生针对多种抗原的抗体编码B细胞受体(BCR)序列。现在已经能够高通量地对这些BCR进行测序;对这些序列的分析为抗体的发育带来了新的见解,特别是对于针对HIV和流感的广泛中和抗体。这种序列分析的一个基本步骤是将每个碱基注释为来自V、D或J基因中的特定一个,或者来自N添加(也称为非模板插入)。先前的工作使用简单的参数分布来对VDJ重组的隐马尔可夫模型(HMM)中状态到状态的转换进行建模,并假设突变在各个位点通过相同的过程发生。然而,已经观察到密码子框架和其他效应违反了此类编码序列的这些参数假设,这表明采用非参数方法对重组过程进行建模可能会很有用。在我们的论文中,我们发现实际上大型现代数据集表明,对于HMM转移概率和每个等位基因每个位置的突变概率,使用富含参数的每个等位基因分类分布的模型,并且使用这样的模型进行推断会带来显著改进的结果。我们使用一种新颖的HMM“分解”策略提出了一个准确且高效的BCR序列注释软件包。这个名为partis(https://github.com/psathyrella/partis/)的软件包基于一个新的通用HMM编译器构建,该编译器在给定HMM的简单文本描述时可以执行高效的推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1b4/4709141/3d7bd2926a9b/pcbi.1004409.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验