从亚利桑那州的两位印第安原住民中从头组装基因组，鉴定出非参考序列中的新多态性。

De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences.

机构信息

Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA.

出版信息

Genome Biol Evol. 2024 Sep 3;16(9). doi: 10.1093/gbe/evae188.

DOI:10.1093/gbe/evae188

PMID:39190003

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11384899/

Abstract

There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.

摘要

人们正在推动通过纳入代表性不足的人群来使人类遗传研究多样化。然而，分析 DNA 序列读段涉及将读段与 GRCh38/hg38 参考基因组对齐的初始步骤，而这对于非欧洲血统是不充分的。在这项研究中，我们使用长读测序技术，从亚利桑那州的两个美洲原住民（IAZ）构建了从头基因组组装。每个组装都包含约 17 Mb 不在 hg38 中的 DNA 序列[非参考序列（NRS）]，这些序列主要由重复元件组成。40 个 NRS 总计 240 kb 被唯一地锚定到 hg38 主要组装上，生成了一个修改后的 hg38-NRS 参考基因组。然后，我们使用来自 387 个 IAZ 的全基因组测序（WGS）数据，分别使用 hg38 和修改后的 hg38-NRS 参考图谱进行 DNA 序列比对和变异调用。使用 hg38-NRS 图谱进行的变异调用在至少 5%的 WGS 样本中鉴定出了约 50,000 个存在的单核苷酸变体，而使用 hg38 参考图谱则未检测到这些变体。我们还直接评估了定位在基因内的 NRS。有 17 个 NRS 锚定在包括两个从头组装中都发现的 187 bp 相同 NRS 的区域内。该 NRS 位于 HCN2 基因的第 3 外显子下游 79 bp 处，包含几个潜在的转录调控元件。HCN2-NRS 的基因分型表明，与其他测试的参考群体相比，该插入在 IAZ 中富集（次要等位基因频率=0.45）。这项研究表明，纳入特定于群体的 NRS 可以极大地改变代表性不足的族群中的变体谱，从而导致发现以前错过的常见变体。

相似文献

De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences.从亚利桑那州的两位印第安原住民中从头组装基因组，鉴定出非参考序列中的新多态性。

Genome Biol Evol. 2024 Sep 3;16(9). doi: 10.1093/gbe/evae188.

De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.两个瑞典基因组的从头组装揭示了人类GRCh38参考基因组中缺失的片段，并改进了群体规模测序数据的变异检测。

Genes (Basel). 2018 Oct 9;9(10):486. doi: 10.3390/genes9100486.

Identifying suitable tools for variant detection and differential gene expression using RNA-seq data.使用 RNA-seq 数据识别用于变异检测和差异基因表达的合适工具。

Genomics. 2020 May;112(3):2166-2172. doi: 10.1016/j.ygeno.2019.12.011. Epub 2019 Dec 17.

ACMGA: a reference-free multiple-genome alignment pipeline for plant species.ACMGA：一种用于植物物种的无参考多基因组比对管道。

BMC Genomics. 2024 May 25;25(1):515. doi: 10.1186/s12864-024-10430-y.

A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome.对单个非白种人全基因组进行变异检测和基因分型的比较研究。

Res Sq. 2023 Mar 6:rs.3.rs-2580940. doi: 10.21203/rs.3.rs-2580940/v1.

Efficient detection and assembly of non-reference DNA sequences with synthetic long reads.使用合成长读长进行非参考 DNA 序列的高效检测和组装。

Nucleic Acids Res. 2022 Oct 14;50(18):e108. doi: 10.1093/nar/gkac653.

Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles.人类泛基因组分析缺失参考基因组序列揭示了它们广泛的进化、表型和功能作用。

Nucleic Acids Res. 2024 Mar 21;52(5):2212-2230. doi: 10.1093/nar/gkae086.

NovoGraph: Human genome graph construction from multiple long-read assemblies.NovoGraph：基于多个长读长组装构建人类基因组图谱。

F1000Res. 2018 Sep 3;7:1391. doi: 10.12688/f1000research.15895.2. eCollection 2018.

Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population.长读段的组装错误会破坏从头组装的特定族群基因组：在中国汉族人群中的验证。

Hum Genet. 2019 Jul;138(7):757-769. doi: 10.1007/s00439-019-02032-6. Epub 2019 Jun 5.

A comparative investigation of single nucleotide variant calling for a personal non-Caucasian sequencing sample.对一个非高加索个体测序样本的单核苷酸变异调用进行比较研究。

Genes Genomics. 2023 Dec;45(12):1527-1536. doi: 10.1007/s13258-023-01439-w. Epub 2023 Aug 31.

本文引用的文献

Pangenome graphs improve the analysis of structural variants in rare genetic diseases.泛基因组图谱提高了罕见遗传病结构变异的分析能力。

Nat Commun. 2024 Jan 22;15(1):657. doi: 10.1038/s41467-024-44980-2.

The landscape of genomic structural variation in Indigenous Australians.澳大利亚原住民的基因组结构变异景观。

Nature. 2023 Dec;624(7992):602-610. doi: 10.1038/s41586-023-06842-7. Epub 2023 Dec 13.

A pangenome reference of 36 Chinese populations.36 个中国人群的泛基因组参考图谱。

Nature. 2023 Jul;619(7968):112-121. doi: 10.1038/s41586-023-06173-7. Epub 2023 Jun 14.

A draft human pangenome reference.人类泛基因组参考草图。

Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

HCN2 Ion Channels Drive Pain in Rodent Models of Migraine.HCN2 离子通道驱动偏头痛啮齿动物模型的疼痛。

J Neurosci. 2022 Oct 5;42(40):7513-7529. doi: 10.1523/JNEUROSCI.0721-22.2022. Epub 2022 Sep 2.

Method of the year: long-read sequencing.年度方法：长读长测序。

Nat Methods. 2023 Jan;20(1):6-11. doi: 10.1038/s41592-022-01730-w.

The complete sequence of a human genome.人类基因组的完整序列。

Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.

Functional variants in cytochrome b5 type A (CYB5A) are enriched in Southwest American Indian individuals and associate with obesity.细胞色素 b5 型 A（CYB5A）中的功能变体在西南美洲印第安个体中富集，并与肥胖相关。

Obesity (Silver Spring). 2022 Feb;30(2):546-552. doi: 10.1002/oby.23359. Epub 2022 Jan 18.

Accurate long-read de novo assembly evaluation with Inspector.使用 Inspector 进行准确的长读从头组装评估。

Genome Biol. 2021 Nov 14;22(1):312. doi: 10.1186/s13059-021-02527-4.

Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation.中国人群中的结构变异及其对表型、疾病和人群适应的影响。

Nat Commun. 2021 Nov 11;12(1):6501. doi: 10.1038/s41467-021-26856-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验