• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

UniAligner:一种无参的快速序列比对框架。

UniAligner: a parameter-free framework for fast sequence alignment.

机构信息

Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA.

Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.

出版信息

Nat Methods. 2023 Sep;20(9):1346-1354. doi: 10.1038/s41592-023-01970-4. Epub 2023 Aug 14.

DOI:10.1038/s41592-023-01970-4
PMID:37580559
Abstract

Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.

摘要

尽管“完整基因组学”的最新进展揭示了以前无法获得的基因组区域,但由于目前没有用于准确比较 ETR 序列的工具,因此分析着丝粒和其他超长串联重复(ETR)的变异面临算法挑战。具有讽刺意味的是,经典的比对方法(如 Smith-Waterman 算法)无法构建 ETR 的生物学上适当的比对。我们提出了 UniAligner,这是一种无参数的序列比对算法,具有依赖于序列的比对评分,可针对任何一对比较序列自动更改。UniAligner 优先考虑罕见子字符串的匹配,这些子字符串更有可能与两个序列之间的进化关系相关。我们应用 UniAligner 来估计人类着丝粒中的突变率,并量化着丝粒中非常高的大重复和缺失率。这种高速率表明,相对于其结构组织,着丝粒可能是人类基因组中进化最快的区域之一。

相似文献

1
UniAligner: a parameter-free framework for fast sequence alignment.UniAligner:一种无参的快速序列比对框架。
Nat Methods. 2023 Sep;20(9):1346-1354. doi: 10.1038/s41592-023-01970-4. Epub 2023 Aug 14.
2
Genome comparison without alignment using shortest unique substrings.使用最短唯一子串进行无需比对的基因组比较。
BMC Bioinformatics. 2005 May 23;6:123. doi: 10.1186/1471-2105-6-123.
3
SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库:一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。
PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.
4
A Review of Parallel Implementations for the Smith-Waterman Algorithm.《Smith-Waterman 算法的并行实现综述》。
Interdiscip Sci. 2022 Mar;14(1):1-14. doi: 10.1007/s12539-021-00473-0. Epub 2021 Sep 6.
5
From analysis of protein structural alignments toward a novel approach to align protein sequences.从蛋白质结构比对分析到一种比对蛋白质序列的新方法。
Proteins. 2004 Feb 15;54(3):569-82. doi: 10.1002/prot.10503.
6
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
7
GSAlign: an efficient sequence alignment tool for intra-species genomes.GSAlign:一种用于种内基因组的高效序列比对工具。
BMC Genomics. 2020 Feb 24;21(1):182. doi: 10.1186/s12864-020-6569-1.
8
CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes.CGAT:一种用于在分析密切相关基因组之间复杂进化变化时可视化比对结果的比较基因组分析工具。
BMC Bioinformatics. 2006 Oct 24;7:472. doi: 10.1186/1471-2105-7-472.
9
Uncertainty in homology inferences: assessing and improving genomic sequence alignment.同源性推断中的不确定性:评估和改进基因组序列比对
Genome Res. 2008 Feb;18(2):298-309. doi: 10.1101/gr.6725608. Epub 2007 Dec 11.
10
A fast adaptive algorithm for computing whole-genome homology maps.一种用于计算全基因组同源图谱的快速自适应算法。
Bioinformatics. 2018 Sep 1;34(17):i748-i756. doi: 10.1093/bioinformatics/bty597.

引用本文的文献

1
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap.使用LexicMap与数百万个原核生物基因组进行高效序列比对。
Nat Biotechnol. 2025 Sep 10. doi: 10.1038/s41587-025-02812-8.
2
A draft UAE-based Arab pangenome reference.一份基于阿联酋的阿拉伯泛基因组参考草案。
Nat Commun. 2025 Jul 24;16(1):6747. doi: 10.1038/s41467-025-61645-w.
3
Minimizing detection bias of somatic mutations in a highly heterozygous oak genome.最小化高度杂合的橡树基因组中体细胞突变的检测偏差。

本文引用的文献

1
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
2
Detecting tandem repeat variants in coding regions using code-adVNTR.使用编码适配VNTR检测编码区域中的串联重复变异体。
iScience. 2022 Jul 19;25(8):104785. doi: 10.1016/j.isci.2022.104785. eCollection 2022 Aug 19.
3
Gene prediction in the immunoglobulin loci.免疫球蛋白基因座中的基因预测。
G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf143.
4
Fast sequence alignment for centromeres with RaMA.使用RaMA对着丝粒进行快速序列比对。
Genome Res. 2025 May 2;35(5):1209-1218. doi: 10.1101/gr.279763.124.
5
GenomeDecoder: inferring segmental duplications in highly repetitive genomic regions.基因组解码器:推断高度重复基因组区域中的片段重复
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf058.
6
Advances in Whole Genome Sequencing: Methods, Tools, and Applications in Population Genomics.全基因组测序进展:群体基因组学中的方法、工具及应用
Int J Mol Sci. 2025 Jan 4;26(1):372. doi: 10.3390/ijms26010372.
7
ModDotPlot-rapid and interactive visualization of tandem repeats.ModDotPlot-快速和交互式串联重复序列可视化。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae493.
8
ModDotPlot-Rapid and interactive visualization of complex repeats.ModDotPlot - 复杂重复序列的快速交互式可视化工具
bioRxiv. 2024 Apr 19:2024.04.15.589623. doi: 10.1101/2024.04.15.589623.
9
Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References.超越人类基因组计划:完整人类基因组序列和泛基因组参考时代。
Annu Rev Genomics Hum Genet. 2024 Aug;25(1):77-104. doi: 10.1146/annurev-genom-021623-081639. Epub 2024 Aug 6.
10
Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes.展望新纪元:常规端粒到端粒基因组的完整遗传信息。
Am J Hum Genet. 2023 Nov 2;110(11):1832-1840. doi: 10.1016/j.ajhg.2023.09.011.
Genome Res. 2022 Jun;32(6):1152-1169. doi: 10.1101/gr.276676.122. Epub 2022 May 11.
4
The Human Pangenome Project: a global resource to map genomic diversity.人类泛基因组计划:绘制基因组多样性图谱的全球资源。
Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.
5
From telomere to telomere: The transcriptional and epigenetic state of human repeat elements.从端粒到端粒:人类重复元件的转录和表观遗传状态。
Science. 2022 Apr;376(6588):eabk3112. doi: 10.1126/science.abk3112. Epub 2022 Apr 1.
6
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.
7
Segmental duplications and their variation in a complete human genome.人类全基因组中的串联重复序列及其变异。
Science. 2022 Apr;376(6588):eabj6965. doi: 10.1126/science.abj6965. Epub 2022 Apr 1.
8
Complete genomic and epigenetic maps of human centromeres.人类着丝粒的完整基因组和表观基因组图谱。
Science. 2022 Apr;376(6588):eabl4178. doi: 10.1126/science.abl4178. Epub 2022 Apr 1.
9
Variation and Evolution of Human Centromeres: A Field Guide and Perspective.人类着丝粒的变异和进化:一个指南和视角。
Annu Rev Genet. 2021 Nov 23;55:583-602. doi: 10.1146/annurev-genet-071719-020519.
10
Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.最小化空间 de Bruijn 图:在个人计算机上数分钟内完成长读段的全基因组组装。
Cell Syst. 2021 Oct 20;12(10):958-968.e6. doi: 10.1016/j.cels.2021.08.009. Epub 2021 Sep 14.