• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据转换在短读病毒序列比对、从头组装和分类中的应用。

The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.

机构信息

School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK.

Modernising Medical Microbiology Consortium, Nuffield Department of Clinical Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DU, UK.

出版信息

Viruses. 2019 Apr 26;11(5):394. doi: 10.3390/v11050394.

DOI:10.3390/v11050394
PMID:31035503
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6563281/
Abstract

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.

摘要

DNA 测序技术的进步正在促进前所未有的范围和规模的基因组分析,扩大了我们生成和充分利用生物序列数据的能力之间的差距。在涉及顺序数据的其他数据密集型领域中也遇到了类似的分析挑战,例如信号处理,其中通常使用降维(即压缩)方法来减轻分析的计算负担。在这项工作中,我们探索了降维方法在数值表示高通量序列数据方面的应用,这些数据对于病毒序列数据的三个重要生物学应用具有重要意义:基于参考的映射、短序列分类和从头组装。利用高度压缩的序列变换来加速序列比较,我们的方法与现有方法的准确性相当,进一步证明了它适用于来自不同病毒群体的序列。我们使用合成和真实病毒病原体序列评估了我们方法的应用。我们的结果表明,使用高度压缩的序列近似值可以提供准确的结果,通过对序列数据进行适当的降维,可以保留甚至增强分析性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/f464d9348d67/viruses-11-00394-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/d116de4e5ed1/viruses-11-00394-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/f978bced15ad/viruses-11-00394-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/d6a34bfc0afc/viruses-11-00394-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/e5e15a6eb070/viruses-11-00394-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/e8fa6eba78e7/viruses-11-00394-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/83e641173672/viruses-11-00394-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/a7caa09889ef/viruses-11-00394-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/b53d06f122b5/viruses-11-00394-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/fe427b4d10c7/viruses-11-00394-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/7d3a01c22025/viruses-11-00394-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/03662f9bf431/viruses-11-00394-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/f464d9348d67/viruses-11-00394-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/d116de4e5ed1/viruses-11-00394-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/f978bced15ad/viruses-11-00394-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/d6a34bfc0afc/viruses-11-00394-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/e5e15a6eb070/viruses-11-00394-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/e8fa6eba78e7/viruses-11-00394-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/83e641173672/viruses-11-00394-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/a7caa09889ef/viruses-11-00394-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/b53d06f122b5/viruses-11-00394-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/fe427b4d10c7/viruses-11-00394-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/7d3a01c22025/viruses-11-00394-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/03662f9bf431/viruses-11-00394-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a93/6563281/f464d9348d67/viruses-11-00394-g012.jpg

相似文献

1
The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.数据转换在短读病毒序列比对、从头组装和分类中的应用。
Viruses. 2019 Apr 26;11(5):394. doi: 10.3390/v11050394.
2
Improvement of Nanopore sequencing provides access to high quality genomic data for multi-component CRESS-DNA plant viruses.纳米孔测序技术的改进为多组分CRESS-DNA植物病毒提供了获取高质量基因组数据的途径。
Virol J. 2025 Mar 18;22(1):78. doi: 10.1186/s12985-025-02694-x.
3
Optimal compressed representation of high throughput sequence data via light assembly.通过轻量级组装实现高通量序列数据的最优压缩表示
Nat Commun. 2018 Feb 8;9(1):566. doi: 10.1038/s41467-017-02480-6.
4
Evaluation of long-read sequencing for Ostreid herpesvirus type 1 genome characterization from infected tissues.利用长读长测序技术对感染组织中的1型牡蛎疱疹病毒基因组进行特征分析
Microbiol Spectr. 2025 Mar 4;13(3):e0208224. doi: 10.1128/spectrum.02082-24. Epub 2025 Jan 23.
5
Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展
PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.
6
Full-Length Sequencing of Circular DNA Viruses Using CIDER-Seq.使用CIDER-Seq对环状DNA病毒进行全长测序。
Methods Mol Biol. 2025;2912:191-204. doi: 10.1007/978-1-0716-4454-6_17.
7
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
8
Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism.对细菌双链DNA病毒基因图谱的探索揭示了广泛镶嵌现象中的ANI差距。
mSystems. 2025 Feb 18;10(2):e0166124. doi: 10.1128/msystems.01661-24. Epub 2025 Jan 29.
9
Virus detection in high-throughput sequencing data without a reference genome of the host.在没有宿主参考基因组的高通量测序数据中进行病毒检测。
Infect Genet Evol. 2018 Dec;66:180-187. doi: 10.1016/j.meegid.2018.09.026. Epub 2018 Oct 3.
10
Software for pre-processing Illumina next-generation sequencing short read sequences.用于预处理Illumina下一代测序短读序列的软件。
Source Code Biol Med. 2014 May 3;9:8. doi: 10.1186/1751-0473-9-8. eCollection 2014.

引用本文的文献

1
Refining SARS-CoV-2 intra-host variation by leveraging large-scale sequencing data.利用大规模测序数据优化严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的宿主内变异
NAR Genom Bioinform. 2024 Nov 12;6(4):lqae145. doi: 10.1093/nargab/lqae145. eCollection 2024 Sep.
2
Quantum analysis of squiggle data.波形数据的量子分析。
BioData Min. 2023 Oct 6;16(1):27. doi: 10.1186/s13040-023-00343-z.
3
CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure.CStone:一种用于短读长数据的从头转录组装配程序,它基于基础图结构识别非嵌合的连续序列。

本文引用的文献

1
A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns.一种无对齐信号处理方法在宏基因组分箱中的应用:多分辨率基因组二值模式。
Sci Rep. 2019 Feb 15;9(1):2159. doi: 10.1038/s41598-018-38197-9.
2
A comparative evaluation of hybrid error correction methods for error-prone long reads.对易错长读进行混合纠错方法的比较评估。
Genome Biol. 2019 Feb 4;20(1):26. doi: 10.1186/s13059-018-1605-z.
3
Errors in long-read assemblies can critically affect protein prediction.长读长组装中的错误会严重影响蛋白质预测。
PLoS Comput Biol. 2021 Nov 23;17(11):e1009631. doi: 10.1371/journal.pcbi.1009631. eCollection 2021 Nov.
Nat Biotechnol. 2019 Feb;37(2):124-126. doi: 10.1038/s41587-018-0004-z.
4
Human Coronavirus NL63 Molecular Epidemiology and Evolutionary Patterns in Rural Coastal Kenya.人类冠状病毒 NL63 的分子流行病学及在肯尼亚沿海农村地区的进化模式。
J Infect Dis. 2018 May 5;217(11):1728-1739. doi: 10.1093/infdis/jiy098.
5
Challenges in the analysis of viral metagenomes.病毒宏基因组分析中的挑战。
Virus Evol. 2016 Aug 3;2(2):vew022. doi: 10.1093/ve/vew022. eCollection 2016 Jul.
6
Unbiased whole-genome deep sequencing of human and porcine stool samples reveals circulation of multiple groups of rotaviruses and a putative zoonotic infection.对人类和猪粪便样本进行无偏倚的全基因组深度测序,揭示了多组轮状病毒的传播以及一种假定的人畜共患感染。
Virus Evol. 2016 Oct 3;2(2):vew027. doi: 10.1093/ve/vew027. eCollection 2016 Jul.
7
Rapid outbreak sequencing of Ebola virus in Sierra Leone identifies transmission chains linked to sporadic cases.对塞拉利昂埃博拉病毒的快速疫情测序确定了与散发病例相关的传播链。
Virus Evol. 2016 Jun 22;2(1):vew016. doi: 10.1093/ve/vew016. eCollection 2016 Jan.
8
Fast and sensitive mapping of nanopore sequencing reads with GraphMap.使用GraphMap对纳米孔测序读数进行快速灵敏的映射
Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307.
9
Fast and sensitive taxonomic classification for metagenomics with Kaiju.使用Kaiju对宏基因组学进行快速且灵敏的分类学分类。
Nat Commun. 2016 Apr 13;7:11257. doi: 10.1038/ncomms11257.
10
Assessing the performance of the Oxford Nanopore Technologies MinION.评估牛津纳米孔技术公司的MinION测序仪的性能。
Biomol Detect Quantif. 2015 Mar;3:1-8. doi: 10.1016/j.bdq.2015.02.001.