• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用TWILIGHT进行超快速和超大的多序列比对。

Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.

作者信息

Tseng Yu-Hsiang, Walia Sumit, Turakhia Yatish

机构信息

Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, United States.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.

DOI:10.1093/bioinformatics/btaf212
PMID:40662833
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12261412/
Abstract

MOTIVATION

Multiple sequence alignment (MSA) is a fundamental operation in bioinformatics, yet existing MSA tools are struggling to keep up with the speed and volume of incoming data. This is because the runtimes and memory requirements of current MSA tools become untenable when processing large numbers of long input sequences, and they also fail to fully harness the parallelism provided by modern CPUs and GPUs.

RESULTS

We present Tall and Wide Alignments at High Throughput (TWILIGHT), a novel MSA tool optimized for speed, accuracy, scalability, and memory constraints, with both CPU and GPU support. TWILIGHT incorporates innovative parallelization and memory-efficiency strategies that enable it to build ultralarge alignments at high speed even on memory-constrained devices. On challenging datasets, TWILIGHT outperformed all other tools in speed and accuracy. It scaled beyond the limits of existing tools and performed an alignment of 1 million RNASim sequences within 30 min while utilizing <16 GB of memory. TWILIGHT is the first tool to align over 8 million publicly available SARS-CoV-2 sequences, setting a new standard for large-scale genomic alignment and data analysis.

AVAILABILITY AND IMPLEMENTATION

TWILIGHT's code is freely available under the MIT license at https://github.com/TurakhiaLab/TWILIGHT. The test datasets and experimental results, including our alignment of 8 million SARS-CoV-2 sequences, are available at https://zenodo.org/records/14722035.

摘要

动机

多序列比对(MSA)是生物信息学中的一项基本操作,但现有的MSA工具难以跟上输入数据的速度和数量。这是因为当前MSA工具的运行时间和内存需求在处理大量长输入序列时变得难以维持,而且它们也未能充分利用现代CPU和GPU提供的并行性。

结果

我们展示了高通量下的高宽比对(TWILIGHT),这是一种针对速度、准确性、可扩展性和内存限制进行优化的新型MSA工具,同时支持CPU和GPU。TWILIGHT采用了创新的并行化和内存效率策略,使其即使在内存受限的设备上也能高速构建超大型比对。在具有挑战性的数据集上,TWILIGHT在速度和准确性方面优于所有其他工具。它突破了现有工具的限制,在30分钟内利用不到16GB的内存对100万个RNASim序列进行了比对。TWILIGHT是第一个比对超过800万个公开可用的SARS-CoV-2序列的工具,为大规模基因组比对和数据分析树立了新的标准。

可用性和实现方式

TWILIGHT的代码在MIT许可下可在https://github.com/TurakhiaLab/TWILIGHT上免费获取。测试数据集和实验结果,包括我们对800万个SARS-CoV-2序列的比对,可在https://zenodo.org/records/14722035上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/6932da92509e/btaf212f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/524f8fc6ed6f/btaf212f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/a8dd6eb8e68c/btaf212f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/2cefa2fb3dbe/btaf212f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/1a7c07bba215/btaf212f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/6932da92509e/btaf212f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/524f8fc6ed6f/btaf212f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/a8dd6eb8e68c/btaf212f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/2cefa2fb3dbe/btaf212f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/1a7c07bba215/btaf212f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/6932da92509e/btaf212f5.jpg

相似文献

1
Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.使用TWILIGHT进行超快速和超大的多序列比对。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.
2
CREMSA: compressed indexing of (ultra) large multiple sequence alignments.CREMSA:(超)大型多序列比对的压缩索引
Bioinformatics. 2025 Jul 1;41(Supplement_1):i246-i254. doi: 10.1093/bioinformatics/btaf211.
3
RCSB protein Data Bank: exploring protein 3D similarities via comprehensive structural alignments.RCSB 蛋白质数据库:通过全面的结构比对探索蛋白质 3D 相似性。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae370.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment.Alevin-fry-atac可使用虚拟颜色实现单细胞ATAC-seq数据的快速且节省内存的映射,以进行准确的基因组伪比对。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i237-i245. doi: 10.1093/bioinformatics/btaf234.
7
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.
8
Lessons learned: overcoming common challenges in reconstructing the SARS-CoV-2 genome from short-read sequencing data via CoVpipe2.经验教训:通过CoVpipe2从短读长测序数据重建严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组时克服常见挑战。
F1000Res. 2024 Apr 16;12:1091. doi: 10.12688/f1000research.136683.1. eCollection 2023.
9
NEAR: neural embeddings for amino acid relationships.NEAR:用于氨基酸关系的神经嵌入
Bioinformatics. 2025 Jul 1;41(Supplement_1):i449-i457. doi: 10.1093/bioinformatics/btaf198.
10
Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.用于 SARS-CoV-2 感染诊断的快速、即时抗原检测。
Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.

本文引用的文献

1
Identifying featured indels associated with SARS-CoV-2 fitness.识别与新冠病毒适应性相关的特征性插入缺失。
Microbiol Spectr. 2023 Sep 12;11(5):e0226923. doi: 10.1128/spectrum.02269-23.
2
AliSim-HPC: parallel sequence simulator for phylogenetics.AliSim-HPC:用于系统发生学的并行序列模拟器。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad540.
3
WMSA 2: a multiple DNA/RNA sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies.
WMSA 2:一种采用精确渐进模式和快速双赢模式(结合中心星和渐进策略)的多 DNA/RNA 序列比对工具。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad190.
4
Maximum likelihood pandemic-scale phylogenetics.最大似然法大流行规模系统发育学。
Nat Genet. 2023 May;55(5):746-752. doi: 10.1038/s41588-023-01368-0. Epub 2023 Apr 10.
5
The origins and molecular evolution of SARS-CoV-2 lineage B.1.1.7 in the UK.英国新冠病毒B.1.1.7谱系的起源与分子进化
Virus Evol. 2022 Aug 26;8(2):veac080. doi: 10.1093/ve/veac080. eCollection 2022.
6
Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny.肌肉 5:高精度比对集合可实现序列同源性和系统发育的无偏评估。
Nat Commun. 2022 Nov 15;13(1):6968. doi: 10.1038/s41467-022-34630-w.
7
COVID-19 Genomics UK (COG-UK) Consortium: Final Report.英国新冠病毒基因组学(COG-UK)联盟:最终报告。
Rand Health Q. 2022 Aug 31;9(4):24. eCollection 2022 Aug.
8
Mashtree: a rapid comparison of whole genome sequence files.Mashtree:全基因组序列文件的快速比较
J Open Source Softw. 2019 Dec 10;4(44). doi: 10.21105/joss.01762.
9
HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences.HAlign 3:快速对齐超大量相似 DNA/RNA 序列。
Mol Biol Evol. 2022 Aug 3;39(8). doi: 10.1093/molbev/msac166.
10
Recursive MAGUS: Scalable and accurate multiple sequence alignment.递归 MAGUS:可扩展且精确的多重序列比对。
PLoS Comput Biol. 2021 Oct 6;17(10):e1008950. doi: 10.1371/journal.pcbi.1008950. eCollection 2021 Oct.