• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用家族将未映射的序列本地化,以验证端粒到端粒组装并确定新的遗传多样性热点。

Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity.

机构信息

Department of Bioengineering, Stanford University, Stanford, California 94305, USA;

Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA.

出版信息

Genome Res. 2023 Oct;33(10):1734-1746. doi: 10.1101/gr.277175.122. Epub 2023 Oct 25.

DOI:10.1101/gr.277175.122
PMID:37879860
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10691534/
Abstract

Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.

摘要

尽管在基因组学中普遍存在,但当前的人类参考基因组(GRCh38)并不完整:它缺少大片段异染色质序列,并且作为单一的线性参考基因组,它不能代表人类遗传多样性的全貌。为了描述 GRCh38 和人类遗传多样性中的空白,我们开发了一种使用核家庭进行序列位置近似的算法(ASLAN),以识别无法与 GRCh38 对齐的读取的起源区域。使用未映射的读取和全基因组序列(WGS)的变体调用,ASLAN 使用最大似然模型来识别给定未映射读取中该子序列的分布和家庭的相位,该子序列最有可能属于基因组的区域。在合成数据和诱饵基因组替代单倍型的读取上验证 ASLAN 后,ASLAN 以 >92%的准确率和 ∼1 Mb 的分辨率定位了 >90%的>100bp 序列。然后,我们在超过 700 个家庭的 WGS 的未映射读取的 100-mers 上运行 ASLAN,并将 ASLAN 定位与 100-mers 到最近发布的 T2T-CHM13 组装的对齐进行比较。我们发现,GRCh38 中的许多未映射读取源自 GRCh38 中的空白端粒和着丝粒。ASLAN 定位与 T2T-CHM13 对齐高度一致,除了着丝粒区域的近端着丝粒染色体。比较 ASLAN 定位和 T2T-CHM13 对齐,我们确定了 T2T-CHM13 中缺失的序列或与其对齐区域具有高度差异的序列,突出了遗传多样性的新热点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/90a4c617e177/1734f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/ea8258945cdf/1734f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/1a5ff00e132b/1734f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/ba8fe607a24a/1734f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/90a4c617e177/1734f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/ea8258945cdf/1734f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/1a5ff00e132b/1734f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/ba8fe607a24a/1734f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c65/10691534/90a4c617e177/1734f04.jpg

相似文献

1
Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity.利用家族将未映射的序列本地化,以验证端粒到端粒组装并确定新的遗传多样性热点。
Genome Res. 2023 Oct;33(10):1734-1746. doi: 10.1101/gr.277175.122. Epub 2023 Oct 25.
2
T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese.T2T-YAO:一个端粒到端粒组装的中国汉族二倍体参考基因组。
Genomics Proteomics Bioinformatics. 2023 Dec;21(6):1085-1100. doi: 10.1016/j.gpb.2023.08.001. Epub 2023 Aug 16.
3
Genome-wide maps of highly-similar intrachromosomal repeats that mediate ectopic recombination in three human genome assemblies.在三个人类基因组组装体中,介导异位重组的高度相似的染色体内重复序列的全基因组图谱。
bioRxiv. 2024 Jan 31:2024.01.29.577884. doi: 10.1101/2024.01.29.577884.
4
The complete sequence of a human Y chromosome.人类 Y 染色体的完整序列。
Nature. 2023 Sep;621(7978):344-354. doi: 10.1038/s41586-023-06457-y. Epub 2023 Aug 23.
5
A complete reference genome improves analysis of human genetic variation.完整的参考基因组提高了人类遗传变异分析的能力。
Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1.
6
Enhancing Variant Calling in Whole-exome Sequencing Data Using Population-matched Reference Genomes.使用群体匹配参考基因组增强全外显子组测序数据中的变异检测
Genomics Proteomics Bioinformatics. 2024 Dec 3;22(5). doi: 10.1093/gpbjnl/qzae070.
7
A Method for Localizing Non-Reference Sequences to the Human Genome.一种将非参考序列定位到人类基因组的方法。
Pac Symp Biocomput. 2022;27:313-324.
8
Inversion polymorphism in a complete human genome assembly.人类基因组完整组装中的倒位多态性。
Genome Biol. 2023 Apr 30;24(1):100. doi: 10.1186/s13059-023-02919-8.
9
Characterization of large-scale genomic differences in the first complete human genome.大规模人类全基因组中基因组差异的特征。
Genome Biol. 2023 Jul 4;24(1):157. doi: 10.1186/s13059-023-02995-w.
10
Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.人类基因组的锚定伪从头组装可从未映射的序列读取中识别出广泛的序列变异。
Hum Genet. 2016 Jul;135(7):727-40. doi: 10.1007/s00439-016-1667-5. Epub 2016 Apr 9.

本文引用的文献

1
Identifying crossovers and shared genetic material in whole genome sequencing data from families.鉴定来自家族的全基因组测序数据中的交叉和共享遗传物质。
Genome Res. 2023 Oct;33(10):1747-1756. doi: 10.1101/gr.277172.122. Epub 2023 Oct 25.
2
Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.泛保守片段标签可识别人类泛基因组组装体间的超保守序列。
Cell Rep Methods. 2023 Aug 2;3(8):100543. doi: 10.1016/j.crmeth.2023.100543. eCollection 2023 Aug 28.
3
A draft human pangenome reference.
人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
4
Recombination between heterologous human acrocentric chromosomes.异源人类近端着丝粒染色体之间的重组。
Nature. 2023 May;617(7960):335-343. doi: 10.1038/s41586-023-05976-y. Epub 2023 May 10.
5
Gaps and complex structurally variant loci in phased genome assemblies.分相基因组组装中的缺口和复杂结构变异位点。
Genome Res. 2023 Apr;33(4):496-510. doi: 10.1101/gr.277334.122. Epub 2023 May 10.
6
Transmission dynamics of human herpesvirus 6A, 6B and 7 from whole genome sequences of families.从家庭的全基因组序列中分析人类疱疹病毒 6A、6B 和 7 的传播动力学。
Virol J. 2022 Dec 24;19(1):225. doi: 10.1186/s12985-022-01941-9.
7
The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families.人类“污染组”:1000 个家庭的全基因组序列中的细菌、病毒和计算污染。
Sci Rep. 2022 Jun 14;12(1):9863. doi: 10.1038/s41598-022-13269-z.
8
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.
9
Segmental duplications and their variation in a complete human genome.人类全基因组中的串联重复序列及其变异。
Science. 2022 Apr;376(6588):eabj6965. doi: 10.1126/science.abj6965. Epub 2022 Apr 1.
10
Complete genomic and epigenetic maps of human centromeres.人类着丝粒的完整基因组和表观基因组图谱。
Science. 2022 Apr;376(6588):eabl4178. doi: 10.1126/science.abl4178. Epub 2022 Apr 1.