Suppr超能文献

人类测序数据中非人类序列的大规模比较。

Large scale comparison of non-human sequences in human sequencing data.

作者信息

Tae Hongseok, Karunasena Enusha, Bavarva Jasmin H, McIver Lauren J, Garner Harold R

机构信息

Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.

Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.

出版信息

Genomics. 2014 Dec;104(6 Pt B):453-8. doi: 10.1016/j.ygeno.2014.08.009. Epub 2014 Aug 27.

Abstract

Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as 'time stamps'. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.

摘要

多项研究表明,下一代测序数据中未映射的读段可用于识别感染因子或结构变异,但尚未有人集中精力对单个大数据集中发现的所有非人类序列进行分析和分类。为了通过感染因子和假定的污染事件识别非人类序列中的共性,我们分析了来自千人基因组计划的150个基因组测序数据文件中的非人类序列,发现平均0.13%的读段与非人类基因组具有相似性。我们比较了根据种族、测序中心和富集方法(全基因组测序与外显子组测序)划分的不同样本组之间的结果,发现测序中心具有作为“时间戳”的污染基因组的特定特征。我们还观察到许多未映射的读段因人类序列与非人类基因组组装体(如小鼠和烟草)中的序列高度相似而错误地表明存在污染。

相似文献

1
Large scale comparison of non-human sequences in human sequencing data.人类测序数据中非人类序列的大规模比较。
Genomics. 2014 Dec;104(6 Pt B):453-8. doi: 10.1016/j.ygeno.2014.08.009. Epub 2014 Aug 27.
9
Human Contamination in Public Genome Assemblies.公共基因组组装中的人类污染
PLoS One. 2016 Sep 9;11(9):e0162424. doi: 10.1371/journal.pone.0162424. eCollection 2016.

引用本文的文献

本文引用的文献

4
Rapid identification of non-human sequences in high-throughput sequencing datasets.高通量测序数据中非人类序列的快速鉴定。
Bioinformatics. 2012 Apr 15;28(8):1174-5. doi: 10.1093/bioinformatics/bts100. Epub 2012 Feb 28.
6
Pathogen detection using short-RNA deep sequencing subtraction and assembly.使用短 RNA 深度测序消减和组装进行病原体检测。
Bioinformatics. 2011 Aug 1;27(15):2027-30. doi: 10.1093/bioinformatics/btr349. Epub 2011 Jun 11.
8
Whole exome capture in solution with 3 Gbp of data.溶液中捕获全外显子组,数据量 30 亿位。
Genome Biol. 2010;11(6):R62. doi: 10.1186/gb-2010-11-6-r62. Epub 2010 Jun 17.
9
Challenges of sequencing human genomes.人类基因组测序的挑战。
Brief Bioinform. 2010 Sep;11(5):484-98. doi: 10.1093/bib/bbq016. Epub 2010 Jun 2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验