Zhang Weiyi, Tariq Arslan, Jia Xinxin, Yan Jianbing, Fernie Alisdair R, Usadel Björn, Wen Weiwei
National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Key Laboratory of Horticultural Plant Biology (MOE), Hubei Hongshan Laboratory, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, China.
Institute for Biological Data Science, CEPLAS, Heinrich-Heine Universität, Düsseldorf, Germany.
Nat Protoc. 2025 Mar;20(3):690-708. doi: 10.1038/s41596-024-01063-2. Epub 2024 Oct 2.
Haplotype phasing represents a pivotal procedure in genome analysis, entailing the identification of specific genetic variant combinations on each chromosome. Achieving chromosome-level genome phasing constitutes a considerable challenge, particularly in organisms with large and complex genomes. To address this challenge, we have developed a robust, gamete cell-based phasing pipeline, including wet-laboratory processes for plant sperm cell isolation, short-read sequencing and a bioinformatics workflow to generate chromosome-level phasing. The bioinformatics workflow is applicable for both plant and other sperm cells, for example, those of mammals. Our pipeline ensures high-quality single-nucleotide polymorphism (SNP) calling for each sperm cell and the subsequent construction of a high-density genetic map. The genetic map facilitates accurate chromosome-level genome phasing, enables crossover event detection and could be used to correct potential assembly errors. Our bioinformatics pipeline runs on a Linux system and most of its steps can be executed in parallel, expediting the analysis process. The entire workflow can be performed over the course of 1 d. We provide a practical example from our previous research using this protocol and provide the whole bioinformatics pipeline as a Docker image to ensure its easy adaptability to other studies.
单倍型定相是基因组分析中的一个关键步骤,需要识别每条染色体上特定的遗传变异组合。实现染色体水平的基因组定相是一项相当大的挑战,特别是在具有庞大而复杂基因组的生物体中。为应对这一挑战,我们开发了一种强大的、基于配子细胞的定相流程,包括用于植物精子细胞分离的湿实验室流程、短读长测序以及用于生成染色体水平定相的生物信息学工作流程。该生物信息学工作流程适用于植物和其他精子细胞,例如哺乳动物的精子细胞。我们的流程可确保对每个精子细胞进行高质量的单核苷酸多态性(SNP)检测,并随后构建高密度遗传图谱。该遗传图谱有助于进行准确的染色体水平基因组定相,能够检测交叉事件,并可用于纠正潜在的组装错误。我们的生物信息学流程在Linux系统上运行,其大部分步骤可以并行执行,从而加快分析过程。整个工作流程可在1天内完成。我们提供了一个来自我们先前研究的使用该协议的实际示例,并将整个生物信息学流程作为一个Docker镜像提供,以确保其易于应用于其他研究。