Pratas Diogo, Hosseini Morteza, Grilo Gonçalo, Pinho Armando J, Silva Raquel M, Caetano Tânia, Carneiro João, Pereira Filipe
Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal.
Genes (Basel). 2018 Sep 6;9(9):445. doi: 10.3390/genes9090445.
The sequencing of ancient DNA samples provides a novel way to find, characterize, and distinguish exogenous genomes of endogenous targets. After sequencing, computational composition analysis enables filtering of undesired sources in the focal organism, with the purpose of improving the quality of assemblies and subsequent data analysis. More importantly, such analysis allows extinct and extant species to be identified without requiring a specific or new sequencing run. However, the identification of exogenous organisms is a complex task, given the nature and degradation of the samples, and the evident necessity of using efficient computational tools, which rely on algorithms that are both fast and highly sensitive. In this work, we relied on a fast and highly sensitive tool, FALCON-meta, which measures similarity against whole-genome reference databases, to analyse the metagenomic composition of an ancient polar bear () jawbone fossil. The fossil was collected in Svalbard, Norway, and has an estimated age of 110,000 to 130,000 years. The FASTQ samples contained 349 GB of nonamplified shotgun sequencing data. We identified and localized, relative to the FASTQ samples, the genomes with significant similarities to reference microbial genomes, including those of viruses, bacteria, and archaea, and to fungal, mitochondrial, and plastidial sequences. Among other striking features, we found significant similarities between modern-human, some bacterial and viral sequences (contamination) and the organelle sequences of wild carrot and tomato relative to the whole samples. For each exogenous candidate, we ran a damage pattern analysis, which in addition to revealing shallow levels of damage in the plant candidates, identified the source as contamination.
古代DNA样本测序为寻找、表征和区分内源性目标的外源基因组提供了一种新方法。测序后,计算组成分析能够筛选出目标生物体中不需要的来源,以提高组装质量和后续数据分析质量。更重要的是,这种分析无需进行特定或新的测序运行就能识别已灭绝和现存的物种。然而,鉴于样本的性质和降解情况,以及使用高效计算工具的明显必要性,识别外源生物体是一项复杂的任务,这些工具依赖于快速且高度敏感的算法。在这项工作中,我们依靠一种快速且高度敏感的工具FALCON-meta,它通过与全基因组参考数据库比对来测量相似性,以分析一个古代北极熊颚骨化石的宏基因组组成。该化石在挪威斯瓦尔巴群岛采集,估计有11万至13万年历史。FASTQ样本包含349GB的非扩增鸟枪法测序数据。相对于FASTQ样本,我们识别并定位了与参考微生物基因组(包括病毒、细菌和古细菌)以及真菌、线粒体和质体序列具有显著相似性的基因组。在其他显著特征中,我们发现相对于整个样本,现代人类、一些细菌和病毒序列(污染)与野生胡萝卜和番茄的细胞器序列之间存在显著相似性。对于每个外源候选物,我们进行了损伤模式分析,除了揭示植物候选物中浅层损伤外,还将来源确定为污染。