Suppr超能文献

量化和减少超短古 DNA 序列分析中的虚假比对。

Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences.

机构信息

Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany.

出版信息

BMC Biol. 2018 Oct 25;16(1):121. doi: 10.1186/s12915-018-0581-9.

Abstract

BACKGROUND

The study of ancient DNA is hampered by degradation, resulting in short DNA fragments. Advances in laboratory methods have made it possible to retrieve short DNA fragments, thereby improving access to DNA preserved in highly degraded, ancient material. However, such material contains large amounts of microbial contamination in addition to DNA fragments from the ancient organism. The resulting mixture of sequences constitutes a challenge for computational analysis, since microbial sequences are hard to distinguish from the ancient sequences of interest, especially when they are short.

RESULTS

Here, we develop a method to quantify spurious alignments based on the presence or absence of rare variants. We find that spurious alignments are enriched for mismatches and insertion/deletion differences and lack substitution patterns typical of ancient DNA. The impact of spurious alignments can be reduced by filtering on these features and by imposing a sample-specific minimum length cutoff. We apply this approach to sequences from four ~ 430,000-year-old Sima de los Huesos hominin remains, which contain particularly short DNA fragments, and increase the amount of usable sequence data by 17-150%. This allows us to place a third specimen from the site on the Neandertal lineage.

CONCLUSIONS

Our method maximizes the sequence data amenable to genetic analysis from highly degraded ancient material and avoids pitfalls that are associated with the analysis of ultra-short DNA sequences.

摘要

背景

古 DNA 的研究受到降解的阻碍,导致 DNA 片段较短。实验室方法的进步使得从高度降解的古代材料中提取短 DNA 片段成为可能,从而增加了对保存在其中的 DNA 的获取途径。然而,这种材料除了含有来自古老生物体的 DNA 片段外,还含有大量的微生物污染。由此产生的序列混合物对计算分析构成了挑战,因为微生物序列很难与感兴趣的古老序列区分开来,尤其是当它们很短时。

结果

在这里,我们开发了一种基于稀有变异的存在或缺失来量化虚假比对的方法。我们发现,虚假比对富含错配和插入/缺失差异,并且缺乏古老 DNA 的典型替换模式。通过过滤这些特征并施加特定样本的最小长度截止值,可以减少虚假比对的影响。我们将这种方法应用于来自四个约 43 万年前的 Sima de los Huesos 古人类遗骸的序列中,这些遗骸中含有特别短的 DNA 片段,使可用序列数据量增加了 17% 到 150%。这使我们能够将该遗址的第三个标本置于尼安德特人谱系上。

结论

我们的方法最大限度地提高了从高度降解的古代材料中进行遗传分析的序列数据量,并避免了与超短 DNA 序列分析相关的陷阱。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c596/6202837/01a5b33b9d35/12915_2018_581_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验