• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

KmerKeys:一个用于搜索索引基因组组装和变体的网络资源。

KmerKeys: a web resource for searching indexed genome assemblies and variants.

机构信息

Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA.

Stanford Genome Technology Center West, Stanford University, Palo Alto, CA, 94304, USA.

出版信息

Nucleic Acids Res. 2022 Jul 5;50(W1):W448-W453. doi: 10.1093/nar/gkac266.

DOI:10.1093/nar/gkac266
PMID:35474383
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9252721/
Abstract

K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.

摘要

K-mers 是用于基因组序列分析的短 DNA 序列。使用 K-mers 的应用程序包括基因组组装和比对。然而,这些短序列在更广泛的生物信息学中的应用具有与基因组序列数据的大规模相关的挑战。单个人类基因组组装具有数十亿个 K-mers。因此,分析 K-mer 信息的计算要求非常高,特别是在涉及完整基因组组装时。为了解决这些问题,我们开发了一种新的索引数据结构,该结构基于针对短序列键查找进行调优的哈希表。这个名为 KmerKeys 的网络应用程序为基因组组装的云计算提供了高性能、快速的查询速度。我们能够对组装进行模糊和精确的序列搜索。为了实现稳健和快速的性能,该网站实现了缓存友好的哈希表、内存映射和大规模并行处理。我们的方法采用了一种可扩展和高效的数据结构,可用于联合索引和搜索大量人类基因组组装信息。人们可以包括变体数据库及其相关元数据,例如 gnomAD 人群变体目录。此功能使未来的基因组信息能够融入测序分析。KmerKeys 可在 https://kmerkeys.dgi-stanford.org 免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/0a806a4c3318/gkac266fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/56241febed68/gkac266fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/4fbeb5c67461/gkac266fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/0a806a4c3318/gkac266fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/56241febed68/gkac266fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/4fbeb5c67461/gkac266fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7519/9252721/0a806a4c3318/gkac266fig3.jpg

相似文献

1
KmerKeys: a web resource for searching indexed genome assemblies and variants.KmerKeys:一个用于搜索索引基因组组装和变体的网络资源。
Nucleic Acids Res. 2022 Jul 5;50(W1):W448-W453. doi: 10.1093/nar/gkac266.
2
ntCard: a streaming algorithm for cardinality estimation in genomics data.ntCard:一种用于基因组数据基数估计的流算法。
Bioinformatics. 2017 May 1;33(9):1324-1330. doi: 10.1093/bioinformatics/btw832.
3
Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.优化从头转录组组装从高通量短读测序数据提高非模式生物的功能注释。
BMC Bioinformatics. 2012 Jul 18;13:170. doi: 10.1186/1471-2105-13-170.
4
Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.通过查询 K -mer 的固定采样和索引 K-mer 的布隆过滤实现最大精确匹配的快速检测。
Bioinformatics. 2019 Nov 1;35(22):4560-4567. doi: 10.1093/bioinformatics/btz273.
5
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
6
Turtle: identifying frequent k-mers with cache-efficient algorithms.海龟:使用缓存高效算法识别频繁的 k-mer。
Bioinformatics. 2014 Jul 15;30(14):1950-7. doi: 10.1093/bioinformatics/btu132. Epub 2014 Mar 10.
7
Querying large read collections in main memory: a versatile data structure.在主内存中查询大型读取集合:一种通用的数据结构。
BMC Bioinformatics. 2011 Jun 17;12:242. doi: 10.1186/1471-2105-12-242.
8
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
9
Compact representation of k-mer de Bruijn graphs for genome read assembly.用于基因组读取组装的 k-mer de Bruijn 图的紧凑表示。
BMC Bioinformatics. 2013 Oct 23;14:313. doi: 10.1186/1471-2105-14-313.
10
SeqWare Query Engine: storing and searching sequence data in the cloud.SeqWare 查询引擎:在云端存储和搜索序列数据。
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S2. doi: 10.1186/1471-2105-11-S12-S2.

引用本文的文献

1
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
2
Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.泛保守片段标签可识别人类泛基因组组装体间的超保守序列。
Cell Rep Methods. 2023 Aug 2;3(8):100543. doi: 10.1016/j.crmeth.2023.100543. eCollection 2023 Aug 28.