Bleakley Kevin, Giudicelli Véronique, Wu Yan, Lefranc Marie-Paule, Biau Gérard
Institut de Mathématiques et de Modélisation de Montpellier, UMR CNRS 5149, Equipe de Probabilités et Statistique, Université Montpellier II, Montpellier, France.
In Silico Biol. 2006;6(6):573-88.
The diversity of immunoglobulin (IG) and T cell receptor (TR) chains depends on several mechanisms: combinatorial diversity, which is a consequence of the number of V, D and J genes and the N-REGION diversity, which creates an extensive and clonal somatic diversity at the V-J and V-D-J junctions. For the IG, the diversity is further increased by somatic hypermutations. The number of different junctions per chain and per individual is estimated to be 10(12). We have chosen the human TRAV-TRAJ junctions as an example in order to characterize the required criteria for a standardized analysis of the IG and TR V-J and V-D-J junctions, based on the IMGT-ONTOLOGY concepts, and to serve as a first IMGT junction reference set (IMGT, http://imgt.cines.fr). We performed a thorough statistical analysis of 212 human rearranged TRAV-TRAJ sequences, which were aligned and analysed by the integrated IMGT/V-QUEST software, which includes IMGT/JunctionAnalysis, then manually expert-verified. Furthermore, we compared these 212 sequences with 37 other human TRAV-TRAJ junction sequences for which some particularities (potential sequence polymorphisms, sequencing errors, etc.) did not allow IMGT/JunctionAnalysis to provide the correct biological results, according to expert verification. Using statistical learning, we constructed an automatic warning system to predict if new, automatically analysed TRAV-TRAJ sequences should be manually re-checked. We estimated the robustness of this automatic warning system.
免疫球蛋白(IG)和T细胞受体(TR)链的多样性取决于多种机制:组合多样性,这是V、D和J基因数量的结果;以及N区多样性,它在V-J和V-D-J连接点处产生广泛的克隆性体细胞多样性。对于IG来说,体细胞超突变进一步增加了多样性。据估计,每条链和每个个体的不同连接点数量为10¹² 。我们选择人类TRAV-TRAJ连接点作为示例,以便基于IMGT本体概念,确定对IG和TR V-J以及V-D-J连接点进行标准化分析所需的标准,并作为第一个IMGT连接点参考集(IMGT,http://imgt.cines.fr)。我们对212条人类重排的TRAV-TRAJ序列进行了全面的统计分析,这些序列由集成的IMGT/V-QUEST软件进行比对和分析,该软件包括IMGT/连接点分析,然后经过人工专家验证。此外,我们将这212条序列与另外37条人类TRAV-TRAJ连接点序列进行了比较,根据专家验证,对于后一组序列,由于某些特殊性(潜在的序列多态性、测序错误等),IMGT/连接点分析无法提供正确的生物学结果。通过统计学习,我们构建了一个自动预警系统,以预测新的、自动分析的TRAV-TRAJ序列是否应进行人工重新检查。我们评估了这个自动预警系统的稳健性。