Berry Charles C, Nobles Christopher, Six Emmanuelle, Wu Yinghua, Malani Nirav, Sherman Eric, Dryga Anatoly, Everett John K, Male Frances, Bailey Aubrey, Bittinger Kyle, Drake Mary J, Caccavelli Laure, Bates Paul, Hacein-Bey-Abina Salima, Cavazzana Marina, Bushman Frederic D
Department of Family Medicine and Public Health, UC San Diego, La Jolla, CA 92093, USA.
Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104-6076, USA.
Mol Ther Methods Clin Dev. 2016 Dec 18;4:17-26. doi: 10.1016/j.omtm.2016.11.003. eCollection 2017 Mar 17.
Analysis of sites of newly integrated DNA in cellular genomes is important to several fields, but methods for analyzing and visualizing these datasets are still under development. Here, we describe tools for data analysis and visualization that take as input integration site data from our INSPIIRED pipeline. Paired-end sequencing allows inference of the numbers of transduced cells as well as the distributions of integration sites in target genomes. We present interactive heatmaps that allow comparison of distributions of integration sites to genomic features and that support numerous user-defined statistical tests. To summarize integration site data from human gene therapy samples, we developed a reproducible report format that catalogs sample population structure, longitudinal dynamics, and integration frequency near cancer-associated genes. We also introduce a novel summary statistic, the UC50 (unique cell progenitors contributing the most expanded 50% of progeny cell clones), which provides a single number summarizing possible clonal expansion. Using these tools, we characterize ongoing longitudinal characterization of a patient from the first trial to treat severe combined immunodeficiency-X1 (SCID-X1), showing successful reconstitution for 15 years accompanied by persistence of a cell clone with an integration site near the cancer-associated gene CCND2. Software is available at https://github.com/BushmanLab/INSPIIRED.
分析细胞基因组中新整合DNA的位点对多个领域都很重要,但用于分析和可视化这些数据集的方法仍在开发中。在这里,我们描述了用于数据分析和可视化的工具,这些工具将我们的INSPIIRED流程中的整合位点数据作为输入。双端测序允许推断转导细胞的数量以及目标基因组中整合位点的分布。我们展示了交互式热图,它允许将整合位点的分布与基因组特征进行比较,并支持众多用户定义的统计测试。为了总结来自人类基因治疗样本的整合位点数据,我们开发了一种可重复的报告格式,该格式对样本群体结构、纵向动态以及癌症相关基因附近的整合频率进行编目。我们还引入了一种新的汇总统计量,即UC50(贡献了最广泛的50%后代细胞克隆的独特细胞祖细胞),它提供了一个单一数字来总结可能的克隆扩增情况。使用这些工具,我们对首例治疗重症联合免疫缺陷-X1(SCID-X1)试验中的一名患者进行了持续的纵向特征分析,结果显示成功重建长达15年,同时伴有一个在癌症相关基因CCND2附近有整合位点的细胞克隆持续存在。软件可在https://github.com/BushmanLab/INSPIIRED获取。