Wiener Emma, Cottino Laura, Botha Gerrit, Nyangiri Oscar, Noyes Harry, McLeod Annette, Jakubosky David, Adebamowo Clement, Awadalla Phillip, Landouré Guida, Matshaba Mogomotsi, Matovu Enock, Ramsay Michèle, Simo Gustave, Simuunza Martin, Tiemessen Caroline, Wonkam Ambroise, Sahibdeen Venesa, Krause Amanda, Lombard Zané, Hazelhurst Scott
Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa.
Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
Res Sq. 2024 Jul 8:rs.3.rs-4485126. doi: 10.21203/rs.3.rs-4485126/v1.
Structural variants are responsible for a large part of genomic variation between individuals and play a role in both common and rare diseases. Databases cataloguing structural variants notably do not represent the full spectrum of global diversity, particularly missing information from most African populations. To address this representation gap, we analysed 1,091 high-coverage African genomes, 545 of which are public data sets, and 546 which have been analysed for structural variants for the first time. Variants were called using five different tools and datasets merged and jointly called using SURVIVOR. We identified 67,795 structural variants throughout the genome, with 10,421 genes having at least one variant. Using a conservative overlap in merged data, 6,414 of the structural variants (9.5%) are novel compared to the Database of Genomic Variants. This study contributes to knowledge of the landscape of structural variant diversity in Africa and presents a reliable dataset for potential applications in population genetics and health-related research.
结构变异是个体间基因组变异的主要原因,在常见疾病和罕见疾病中均起作用。专门编目结构变异的数据库显然并未涵盖全球多样性的全貌,尤其是大多数非洲人群的信息缺失。为了弥补这一代表性差距,我们分析了1091个高覆盖度的非洲基因组,其中545个是公共数据集,546个是首次针对结构变异进行分析的。使用五种不同工具调用变异,并使用SURVIVOR合并数据集并进行联合调用。我们在全基因组中鉴定出67795个结构变异,其中10421个基因至少有一个变异。在合并数据中使用保守重叠,与基因组变异数据库相比,6414个结构变异(9.5%)是新发现的。本研究有助于了解非洲结构变异多样性的情况,并为群体遗传学和健康相关研究的潜在应用提供了一个可靠的数据集。