• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组测序数据中可定位病毒读段的测序设施和 DNA 来源相关模式。

Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data.

机构信息

Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA.

Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; Department of Computer Science, University of Vermont, Burlington, VT 05405, USA; Neuroscience, Behavior, Health Initiative, University of Vermont, Burlington, VT 05405, USA.

出版信息

Genomics. 2021 Jan;113(1 Pt 2):1189-1198. doi: 10.1016/j.ygeno.2020.12.004. Epub 2020 Dec 7.

DOI:10.1016/j.ygeno.2020.12.004
PMID:33301893
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7856238/
Abstract

Numerous viral sequences have been reported in the whole-genome sequencing (WGS) data of human blood. However, it is not clear to what degree the virus-mappable reads represent true viral sequences rather than random-mapping or noise originating from sample preparation, sequencing processes, or other sources. Identification of patterns of virus-mappable reads may generate novel indicators for evaluating the origins of these viral sequences. We characterized paired-end unmapped reads and reads aligned to viral references in human WGS datasets, then compared patterns of the virus-mappable reads among DNA sources and sequencing facilities which produced these datasets. We then examined potential origins of the source- and facility-associated viral reads. The proportions of clean unmapped reads among the seven sequencing facilities were significantly different (P < 2 × 10). We identified 260,339 reads that were mappable to a total of 99 viral references in 2535 samples. The majority (86.7%) of these virus-mappable reads (corresponding to 47 viral references), which can be classified into four groups based on their distinct patterns, were strongly associated with sequencing facility or DNA source (adjusted P value <0.01). Possible origins of these reads include artificial sequences in library preparation, recombinant vectors in cell culture, and phages co-contaminated with their host bacteria. The sequencing facility-associated virus-mappable reads and patterns were repeatedly observed in other datasets produced in the same facilities. We have constructed an analytic framework and profiled the unmapped reads mappable to viral references. The results provide a new understanding of sequencing facility- and DNA source-associated batch effects in deep sequencing data and may facilitate improved bioinformatics filtering of reads.

摘要

在人类全基因组测序(WGS)数据中已经报道了许多病毒序列。然而,尚不清楚可映射病毒的读段在何种程度上代表真正的病毒序列,而不是来自样本制备、测序过程或其他来源的随机映射或噪声。识别可映射病毒读段的模式可能会产生新的指标,用于评估这些病毒序列的来源。我们对人类 WGS 数据集的未配对末端读段和与病毒参考序列比对的读段进行了特征描述,然后比较了产生这些数据集的 DNA 来源和测序设施中可映射病毒的读段模式。接着,我们研究了这些来源和设施相关病毒读段的潜在来源。七个测序设施之间的清洁未配对读段比例存在显著差异(P<2×10)。我们在 2535 个样本中共鉴定出 260339 个可映射到 99 个病毒参考序列的读段。这些可映射病毒的读段(对应 47 个病毒参考序列)中,大多数(86.7%)基于其独特的模式可分为四组,与测序设施或 DNA 来源强烈相关(调整后的 P 值<0.01)。这些读段的可能来源包括文库制备中的人工序列、细胞培养中的重组载体,以及与其宿主细菌共污染的噬菌体。在同一设施中产生的其他数据集也反复观察到与测序设施相关的可映射病毒的读段和模式。我们构建了一个分析框架,并对可映射到病毒参考序列的未配对读段进行了分析。这些结果为深入测序数据中与测序设施和 DNA 来源相关的批次效应提供了新的认识,并可能有助于改进对读段的生物信息学过滤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/9f7c9de8821e/nihms-1652683-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/cede1dcd86cf/nihms-1652683-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/042e8911b580/nihms-1652683-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/0ece24c70e05/nihms-1652683-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/cb8a84870bad/nihms-1652683-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/9f7c9de8821e/nihms-1652683-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/cede1dcd86cf/nihms-1652683-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/042e8911b580/nihms-1652683-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/0ece24c70e05/nihms-1652683-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/cb8a84870bad/nihms-1652683-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637a/7856238/9f7c9de8821e/nihms-1652683-f0005.jpg

相似文献

1
Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data.全基因组测序数据中可定位病毒读段的测序设施和 DNA 来源相关模式。
Genomics. 2021 Jan;113(1 Pt 2):1189-1198. doi: 10.1016/j.ygeno.2020.12.004. Epub 2020 Dec 7.
2
UMARS: Un-MAppable Reads Solution.UMARS:无法映射读取解决方案。
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-12-S1-S9.
3
The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families.人类“污染组”:1000 个家庭的全基因组序列中的细菌、病毒和计算污染。
Sci Rep. 2022 Jun 14;12(1):9863. doi: 10.1038/s41598-022-13269-z.
4
A comprehensive next generation sequencing-based virome assessment in brain tissue suggests no major virus - tumor association.对脑组织进行全面的下一代测序病毒组评估表明,没有主要的病毒-肿瘤关联。
Acta Neuropathol Commun. 2016 Jul 11;4(1):71. doi: 10.1186/s40478-016-0338-z.
5
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序:一种合成方法。
Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.
6
Large scale comparison of non-human sequences in human sequencing data.人类测序数据中非人类序列的大规模比较。
Genomics. 2014 Dec;104(6 Pt B):453-8. doi: 10.1016/j.ygeno.2014.08.009. Epub 2014 Aug 27.
7
Mining livestock genome datasets for an unconventional characterization of animal DNA viromes.挖掘家畜基因组数据集以对动物DNA病毒群落进行非常规表征。
Genomics. 2022 Mar;114(2):110312. doi: 10.1016/j.ygeno.2022.110312. Epub 2022 Feb 10.
8
From trash to treasure: detecting unexpected contamination in unmapped NGS data.从垃圾到宝藏:检测未映射 NGS 数据中的意外污染。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):168. doi: 10.1186/s12859-019-2684-x.
9
Detection of viral pathogens in high grade gliomas from unmapped next-generation sequencing data.从无图谱的下一代测序数据中检测高级别神经胶质瘤中的病毒病原体。
Exp Mol Pathol. 2014 Jun;96(3):310-5. doi: 10.1016/j.yexmp.2014.03.010. Epub 2014 Apr 1.
10
VGEA: an RNA viral assembly toolkit.VGEA:一种RNA病毒组装工具包。
PeerJ. 2021 Sep 6;9:e12129. doi: 10.7717/peerj.12129. eCollection 2021.

引用本文的文献

1
Advantages of Mutant Generation by Genome Rearrangements of Non-Conventional Yeast via Direct Nuclease Transfection.通过直接核酸酶转染对非常规酵母进行基因组重排产生突变体的优势。
Genes Cells. 2025 Mar;30(2):e70010. doi: 10.1111/gtc.70010.
2
VirusPredictor: XGBoost-based software to predict virus-related sequences in human data.病毒预测器:基于 XGBoost 的软件,用于预测人类数据中的病毒相关序列。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae192.
3
Metagenomic analysis of viral genes integrated in whole genome sequencing data of Thai patients with Brugada syndrome.

本文引用的文献

1
Identifying viruses from metagenomic data using deep learning.利用深度学习从宏基因组数据中识别病毒。
Quant Biol. 2020 Mar;8(1):64-77. doi: 10.1007/s40484-019-0187-4.
2
Microbiome analyses of blood and tissues suggest cancer diagnostic approach.血液和组织的微生物组分析提示癌症诊断方法。
Nature. 2020 Mar;579(7800):567-574. doi: 10.1038/s41586-020-2095-1. Epub 2020 Mar 11.
3
ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples.ViraMiner:在原始 DNA 序列上进行深度学习,以鉴定人类样本中的病毒基因组。
泰国Brugada综合征患者全基因组测序数据中整合病毒基因的宏基因组分析。
Genomics Inform. 2022 Dec;20(4):e44. doi: 10.5808/gi.22047. Epub 2022 Dec 30.
4
Characterization of Hepatitis B Virus Integrations Identified in Hepatocellular Carcinoma Genomes.乙型肝炎病毒整合在肝癌基因组中的特征。
Viruses. 2021 Feb 4;13(2):245. doi: 10.3390/v13020245.
PLoS One. 2019 Sep 11;14(9):e0222271. doi: 10.1371/journal.pone.0222271. eCollection 2019.
4
Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing libraries.高通量测序病毒组学中的污染病毒序列:700 个测序文库的关联研究。
Clin Microbiol Infect. 2019 Oct;25(10):1277-1285. doi: 10.1016/j.cmi.2019.04.028. Epub 2019 May 4.
5
A virome-wide clonal integration analysis platform for discovering cancer viral etiology.用于发现癌症病毒病因的病毒组全克隆整合分析平台。
Genome Res. 2019 May;29(5):819-830. doi: 10.1101/gr.242529.118. Epub 2019 Mar 14.
6
VIpower: Simulation-based tool for estimating power of viral integration detection via high-throughput sequencing.VIpower:基于模拟的工具,用于通过高通量测序估计病毒整合检测的功效。
Genomics. 2020 Jan;112(1):207-211. doi: 10.1016/j.ygeno.2019.01.015. Epub 2019 Jan 30.
7
PAIPline: pathogen identification in metagenomic and clinical next generation sequencing samples.PAIPline:宏基因组和临床下一代测序样本中的病原体鉴定。
Bioinformatics. 2018 Sep 1;34(17):i715-i721. doi: 10.1093/bioinformatics/bty595.
8
An Atypical Parvovirus Drives Chronic Tubulointerstitial Nephropathy and Kidney Fibrosis.一种非典型细小病毒驱动慢性肾小管间质性肾病和肾脏纤维化。
Cell. 2018 Oct 4;175(2):530-543.e24. doi: 10.1016/j.cell.2018.08.013. Epub 2018 Sep 13.
9
Comprehensive comparative analysis of methods and software for identifying viral integrations.全面比较分析鉴定病毒整合的方法和软件。
Brief Bioinform. 2019 Nov 27;20(6):2088-2097. doi: 10.1093/bib/bby070.
10
Searching for human oncoviruses: Histories, challenges, and opportunities.寻找人类致癌病毒:历史、挑战与机遇。
J Cell Biochem. 2018 Jun;119(6):4897-4906. doi: 10.1002/jcb.26717. Epub 2018 Mar 7.