Kraft Louis, Korneliussen Thorfinn Sand, Sackett Peter Wad, Renaud Gabriel
Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongens Lyngby, 2800, Denmark.
Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen K, 1350, Denmark.
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf407.
DNA damage patterns, such as increased frequencies of C→T and G→A substitutions at fragment ends, are widely used in ancient DNA studies to assess authenticity and detect contamination. In metagenomic studies, fragments can be mapped against multiple references or de novo assembled contigs to identify those likely to be ancient. Generating and comparing damage profiles, however, can be both tedious and time-consuming. Although tools exist for estimating damage in single reference genomes and metagenomic datasets, none efficiently cluster damage patterns.
To address this methodological gap, we developed AdDeam, a tool that combines rapid damage estimation with clustering for streamlined analyses and easy identification of potential contaminants or outliers. Our tool takes aligned ancient DNA (aDNA) fragments from various samples or contigs as input, computes damage patterns, clusters them, and outputs representative damage profiles per cluster, a probability of each sample pertaining to a cluster, as well as a Principal Component Analysis of the damage patterns for each sample for fast visualisation. We evaluated AdDeam on both simulated and empirical datasets. AdDeam effectively distinguishes different damage levels, such as uracil-DNA glycosylase-treated samples, sample-specific damages from specimens of different time periods, and can also distinguish between contigs containing modern or ancient fragments, providing a clear framework for aDNA authentication and facilitating large-scale analyses.
AdDeam is publicly available at https://github.com/LouisPwr/AdDeam and can also be installed via Bioconda. It is implemented in Python and C++. All analysis scripts and datasets are available at https://github.com/LouisPwr/AdDeamAnalysis and on Zenodo under: 10.5281/zenodo.15052427.
DNA损伤模式,例如片段末端C→T和G→A替换频率的增加,在古DNA研究中被广泛用于评估真实性和检测污染。在宏基因组学研究中,片段可以与多个参考序列比对或从头组装成重叠群,以识别可能是古代的片段。然而,生成和比较损伤图谱既繁琐又耗时。尽管存在用于估计单参考基因组和宏基因组数据集中损伤的工具,但没有一个能有效地对损伤模式进行聚类。
为了填补这一方法学空白,我们开发了AdDeam,这是一种将快速损伤估计与聚类相结合的工具,用于简化分析并轻松识别潜在污染物或异常值。我们的工具将来自各种样本或重叠群的比对后的古DNA(aDNA)片段作为输入,计算损伤模式,对其进行聚类,并输出每个聚类的代表性损伤图谱、每个样本属于一个聚类的概率,以及每个样本损伤模式的主成分分析以便快速可视化。我们在模拟数据集和实证数据集上对AdDeam进行了评估。AdDeam能够有效区分不同的损伤水平,如尿嘧啶-DNA糖基化酶处理的样本、不同时间段样本的特异性损伤,还能区分包含现代或古代片段的重叠群,为aDNA鉴定提供了清晰的框架并便于大规模分析。