Suppr超能文献

从亚利桑那州的两位印第安原住民中从头组装基因组,鉴定出非参考序列中的新多态性。

De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences.

机构信息

Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA.

出版信息

Genome Biol Evol. 2024 Sep 3;16(9). doi: 10.1093/gbe/evae188.

Abstract

There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.

摘要

人们正在推动通过纳入代表性不足的人群来使人类遗传研究多样化。然而,分析 DNA 序列读段涉及将读段与 GRCh38/hg38 参考基因组对齐的初始步骤,而这对于非欧洲血统是不充分的。在这项研究中,我们使用长读测序技术,从亚利桑那州的两个美洲原住民(IAZ)构建了从头基因组组装。每个组装都包含约 17 Mb 不在 hg38 中的 DNA 序列[非参考序列(NRS)],这些序列主要由重复元件组成。40 个 NRS 总计 240 kb 被唯一地锚定到 hg38 主要组装上,生成了一个修改后的 hg38-NRS 参考基因组。然后,我们使用来自 387 个 IAZ 的全基因组测序(WGS)数据,分别使用 hg38 和修改后的 hg38-NRS 参考图谱进行 DNA 序列比对和变异调用。使用 hg38-NRS 图谱进行的变异调用在至少 5%的 WGS 样本中鉴定出了约 50,000 个存在的单核苷酸变体,而使用 hg38 参考图谱则未检测到这些变体。我们还直接评估了定位在基因内的 NRS。有 17 个 NRS 锚定在包括两个从头组装中都发现的 187 bp 相同 NRS 的区域内。该 NRS 位于 HCN2 基因的第 3 外显子下游 79 bp 处,包含几个潜在的转录调控元件。HCN2-NRS 的基因分型表明,与其他测试的参考群体相比,该插入在 IAZ 中富集(次要等位基因频率=0.45)。这项研究表明,纳入特定于群体的 NRS 可以极大地改变代表性不足的族群中的变体谱,从而导致发现以前错过的常见变体。

相似文献

本文引用的文献

2
The landscape of genomic structural variation in Indigenous Australians.澳大利亚原住民的基因组结构变异景观。
Nature. 2023 Dec;624(7992):602-610. doi: 10.1038/s41586-023-06842-7. Epub 2023 Dec 13.
3
A pangenome reference of 36 Chinese populations.36 个中国人群的泛基因组参考图谱。
Nature. 2023 Jul;619(7968):112-121. doi: 10.1038/s41586-023-06173-7. Epub 2023 Jun 14.
4
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
5
HCN2 Ion Channels Drive Pain in Rodent Models of Migraine.HCN2 离子通道驱动偏头痛啮齿动物模型的疼痛。
J Neurosci. 2022 Oct 5;42(40):7513-7529. doi: 10.1523/JNEUROSCI.0721-22.2022. Epub 2022 Sep 2.
6
Method of the year: long-read sequencing.年度方法:长读长测序。
Nat Methods. 2023 Jan;20(1):6-11. doi: 10.1038/s41592-022-01730-w.
7
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验