San Raffaele Telethon Institute for Gene Therapy, IRCCS Ospedale San Raffaele, Via Olgettina 60, Milano 20132, Italy.
Centro Nazionale Analisi Fotogrammi (CNAF), Istituto Nazionale di Fisica Nucleare, Viale Carlo Berti Pichat 6/2, Bologna 40127, Italy.
Database (Oxford). 2023 Nov 2;2023. doi: 10.1093/database/baad069.
High-throughput clonal tracking in patients under hematopoietic stem cell gene therapy with integrating vector is instrumental in assessing bio-safety and efficacy. Monitoring the fate of millions of transplanted clones and their progeny across differentiation and proliferation over time leverages the identification of the vector integration sites, used as surrogates of clonal identity. Although γ-tracking retroviral insertion sites (γ-TRIS) is the state-of-the-art algorithm for clonal identification, the computational drawbacks in the tracking algorithm, based on a combinatorial all-versus-all strategy, limit its use in clinical studies with several thousands of samples per patient. We developed the first clonal tracking graph database, InCliniGene (https://github.com/calabrialab/InCliniGene), that imports the output files of γ-TRIS and generates the graph of clones (nodes) connected by arches if two nodes share common genomic features as defined by the γ-TRIS rules. Embedding both clonal data and their connections in the graph, InCliniGene can track all clones longitudinally over samples through data queries that fully explore the graph. This approach resulted in being highly accurate and scalable. We validated InCliniGene using an in vitro dataset, specifically designed to mimic clinical cases, and tested the accuracy and precision. InCliniGene allows extensive use of γ-TRIS in large gene therapy clinical applications and naturally realizes the full data integration of molecular and genomics data, clinical and treatment measurements and genomic annotations. Further extensions of InCliniGene with data federation and with application programming interface will support data mining toward precision, personalized and predictive medicine in gene therapy. Database URL: https://github.com/calabrialab/InCliniGene.
高通量克隆追踪在接受造血干细胞基因治疗的患者中具有评估生物安全性和疗效的作用。随着时间的推移,监测数百万个移植克隆及其后代的分化和增殖命运,利用了整合载体的向量整合位点作为克隆身份的替代物。虽然γ-追踪逆转录病毒插入位点(γ-TRIS)是克隆识别的最新算法,但基于组合全对全策略的跟踪算法的计算缺点限制了其在每个患者数千个样本的临床研究中的使用。我们开发了第一个克隆跟踪图形数据库 InCliniGene(https://github.com/calabrialab/InCliniGene),它导入 γ-TRIS 的输出文件,并根据 γ-TRIS 规则生成由拱门连接的克隆图(节点),如果两个节点共享共同的基因组特征。InCliniGene 将克隆数据及其连接嵌入到图形中,可以通过完全探索图形的数据分析查询对所有克隆进行纵向跟踪。这种方法具有高度的准确性和可扩展性。我们使用一个专门设计的模拟临床病例的体外数据集来验证 InCliniGene,并测试其准确性和精度。InCliniGene 允许在大型基因治疗临床应用中广泛使用 γ-TRIS,并自然实现分子和基因组数据、临床和治疗测量以及基因组注释的完全数据集成。通过数据联合和应用程序编程接口对 InCliniGene 的进一步扩展将支持针对基因治疗的精确、个性化和预测医学的数据挖掘。数据库网址:https://github.com/calabrialab/InCliniGene。