Namias Alice, Sahlin Kristoffer, Makoundou Patrick, Bonnici Iago, Sicard Mathieu, Belkhir Khalid, Weill Mylène
ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
Department of Mathematics, Science for Life Laboratory, Stockholm University, 10691 Stockholm, Sweden.
Comput Struct Biotechnol J. 2023 Jul 16;21:3656-3664. doi: 10.1016/j.csbj.2023.07.012. eCollection 2023.
The importance of gene amplifications in evolution is more and more recognized. Yet, tools to study multi-copy gene families are still scarce, and many such families are overlooked using common sequencing methods. Haplotype reconstruction is even harder for polymorphic multi-copy gene families. Here, we show that all variants (or haplotypes) of a multi-copy gene family present in a single genome, can be obtained using Oxford Nanopore Technologies sequencing of PCR products, followed by steps of mapping, SNP calling and haplotyping. As a proof of concept, we acquired the sequences of highly similar variants of the and genes present in the genome of the Pip, a bacterium infecting mosquitoes Our method relies on a wide database of genes, previously acquired by cloning and Sanger sequencing. We addressed problems commonly faced when using mapping approaches for multi-copy gene families with highly similar variants. In addition, we confirmed that PCR amplification causes frequent chimeras which have to be carefully considered when working on families of recombinant genes. We tested the robustness of the method using a combination of bioinformatics (read simulations) and molecular biology approaches (sequence acquisitions through cloning and Sanger sequencing, specific PCRs and digital droplet PCR). When different haplotypes present within a single genome cannot be reconstructed from short reads sequencing, this pipeline confers a high throughput acquisition, gives reliable results as well as insights of the relative copy numbers of the different variants.
基因扩增在进化中的重要性越来越受到认可。然而,用于研究多拷贝基因家族的工具仍然稀缺,而且使用常规测序方法会忽略许多此类家族。对于多态性多拷贝基因家族而言,单倍型重建则更加困难。在这里,我们表明,通过对PCR产物进行牛津纳米孔技术测序,然后进行映射、单核苷酸多态性(SNP)检测和单倍型分型等步骤,可以获得单个基因组中存在的多拷贝基因家族的所有变体(或单倍型)。作为概念验证,我们获得了感染蚊子的嗜人按蚊基因组中存在的和基因高度相似变体的序列。我们的方法依赖于一个广泛的基因数据库,该数据库先前是通过克隆和桑格测序获得的。我们解决了在对具有高度相似变体的多拷贝基因家族使用映射方法时通常会遇到的问题。此外,我们证实PCR扩增会导致频繁出现嵌合体,在研究重组基因家族时必须仔细考虑这一点。我们使用生物信息学(读取模拟)和分子生物学方法(通过克隆和桑格测序、特异性PCR和数字液滴PCR进行序列获取)相结合的方式测试了该方法的稳健性。当无法从短读长测序中重建单个基因组中存在的不同单倍型时,此流程可实现高通量获取,给出可靠结果以及不同变体相对拷贝数的相关信息。