Michaeli Miri, Barak Michal, Hazanov Lena, Noga Hila, Mehr Ramit
The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel.
J Clin Bioinforma. 2013 Aug 27;3(1):15. doi: 10.1186/2043-9113-3-15.
Immunoglobulin (that is, antibody) and T cell receptor genes are created through somatic gene rearrangement from gene segment libraries. Immunoglobulin genes are further diversified by somatic hypermutation and selection during the immune response. Studying the repertoires of these genes yields valuable insights into immune system function in infections, aging, autoimmune diseases and cancers. The introduction of high throughput sequencing has generated unprecedented amounts of repertoire and mutation data from immunoglobulin genes. However, common analysis programs are not appropriate for pre-processing and analyzing these data due to the lack of a template or reference for the whole gene.
We present here the automated analysis pipeline we created for this purpose, which integrates various software packages of our own development and others', and demonstrate its performance.
Our analysis pipeline presented here is highly modular, and makes it possible to analyze the data resulting from high-throughput sequencing of immunoglobulin genes, in spite of the lack of a template gene. An executable version of the Automation program (and its source code) is freely available for downloading from our website: http://immsilico2.lnx.biu.ac.il/Software.html.
免疫球蛋白(即抗体)和T细胞受体基因是通过基因片段文库的体细胞基因重排产生的。免疫球蛋白基因在免疫反应过程中通过体细胞超突变和选择进一步多样化。研究这些基因的库对了解感染、衰老、自身免疫性疾病和癌症中的免疫系统功能具有重要意义。高通量测序技术的引入产生了前所未有的大量免疫球蛋白基因库和突变数据。然而,由于缺乏全基因的模板或参考,常用的分析程序不适用于这些数据的预处理和分析。
我们在此展示了为此目的创建的自动化分析流程,该流程整合了我们自己开发的和其他的各种软件包,并展示了其性能。
我们在此展示的分析流程具有高度模块化,尽管缺乏模板基因,但仍能够分析免疫球蛋白基因高通量测序产生的数据。自动化程序的可执行版本(及其源代码)可从我们的网站免费下载:http://immsilico2.lnx.biu.ac.il/Software.html。