Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain.
Institut des Sciences de l'Evolution de Montpellier (UMR 5554, CNRS-UM-IRD-EPHE), 11 Université de Motpellier, Place Eugène Bataillon, Montpellier, France.
Bioinformatics. 2020 Feb 15;36(4):1191-1197. doi: 10.1093/bioinformatics/btz727.
Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species.
In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome.
To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3.
Supplementary data are available at Bioinformatics online.
转座元件 (TEs) 构成了迄今为止测序的大多数基因组的重要组成部分。TEs 是物种内和物种间遗传变异的重要组成部分。因此,准确地对基因组中的 TEs 进行基因分型对于完整识别个体、群体和物种之间的遗传差异至关重要。
在这项工作中,我们提出了 T-lex 的新版本,这是一个计算管道,可使用短读高通量测序数据准确地对参考 TE 插入进行基因分型并估计其群体频率。在这个新版本中,我们重新设计了 T-lex 算法,以整合 BWA-MEM 短读对齐器,这是最准确的短读映射器之一,并且可以在更长的短读(例如 >150bp)上运行。我们添加了新的过滤步骤来提高基因分型的准确性,并添加了新的参数,允许用户控制最小和最大读取数以及要对 TE 插入进行基因分型的最小菌株数。我们还首次表明,T-lex3 在植物基因组中提供了准确的 TE 调用。
为了测试 T-lex3 的准确性,我们在黑腹果蝇中调用了 1630 个个体 TE 插入,在人类中调用了 1600 个个体 TE 插入,在水稻基因组中调用了 3067 个个体 TE 插入。我们表明,T-lex3 的新版本是一种广泛适用且准确的工具,可用于对不同基因组大小和不同 TE 含量的生物体进行基因分型和估计 TE 频率。T-lex3 可在 Github 上获得:https://github.com/GonzalezLab/T-lex3。
补充数据可在 Bioinformatics 在线获得。