Department of Biology, Stanford University, 371 Serra St, Stanford, CA 94305-3020, USA.
Nucleic Acids Res. 2011 Mar;39(6):e36. doi: 10.1093/nar/gkq1291. Epub 2010 Dec 21.
Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of 'no data' calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data.
转座元件 (TEs) 是重复的 DNA 序列,它们是几乎所有基因组中普遍存在、极其丰富和动态的组成部分。人们已经投入了大量精力来注释参考基因组中的 TE 副本。测序成本的降低和新出现的来自同一物种的多个菌株的新一代测序 (NGS) 数据为研究 TE 在一系列生物体中的群体基因组学提供了前所未有的机会。在这里,我们提出了一种计算管道 (T-lex),该管道使用 NGS 数据来检测注释的 TE 副本的存在/不存在。T-lex 可以使用大量菌株的数据,并在合理的时间内返回个体 TE 插入的种群频率估计值。我们通过在两个重测序的 Drosophila melanogaster 菌株中实验验证了 T-lex 检测先前鉴定的 768 个 TE 副本存在或不存在的准确性。大约 95%的 TE 插入被检测到,具有 100%的灵敏度和 97%的特异性。我们表明,即使在覆盖度较低的情况下,T-lex 也能为它能够可靠识别的 TE 副本产生准确的结果,但随着覆盖度低于 15×,“无数据”调用的速率会增加。T-lex 是一种广泛适用且灵活的工具,只要有参考基因组、单个 TE 副本注释和 NGS 数据,就可以在任何基因组中使用。