Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
Nucleic Acids Res. 2021 May 7;49(8):e44. doi: 10.1093/nar/gkab010.
Transposons are genomic parasites, and their new insertions can cause instability and spur the evolution of their host genomes. Rapid accumulation of short-read whole-genome sequencing data provides a great opportunity for studying new transposon insertions and their impacts on the host genome. Although many algorithms are available for detecting transposon insertions, the task remains challenging and existing tools are not designed for identifying de novo insertions. Here, we present a new benchmark fly dataset based on PacBio long-read sequencing and a new method TEMP2 for detecting germline insertions and measuring de novo 'singleton' insertion frequencies in eukaryotic genomes. TEMP2 achieves high sensitivity and precision for detecting germline insertions when compared with existing tools using both simulated data in fly and experimental data in fly and human. Furthermore, TEMP2 can accurately assess the frequencies of de novo transposon insertions even with high levels of chimeric reads in simulated datasets; such chimeric reads often occur during the construction of short-read sequencing libraries. By applying TEMP2 to published data on hybrid dysgenic flies inflicted by de-repressed P-elements, we confirmed the continuous new insertions of P-elements in dysgenic offspring before they regain piRNAs for P-element repression. TEMP2 is freely available at Github: https://github.com/weng-lab/TEMP2.
转座子是基因组寄生虫,它们的新插入可能导致宿主基因组的不稳定性和进化。快速积累的短读长全基因组测序数据为研究新的转座子插入及其对宿主基因组的影响提供了极好的机会。虽然有许多算法可用于检测转座子插入,但这项任务仍然具有挑战性,并且现有的工具不是为鉴定从头插入而设计的。在这里,我们提出了一个新的基于 PacBio 长读测序的基准果蝇数据集和一种新的方法 TEMP2,用于检测生殖系插入并测量真核生物基因组中从头 '单一体' 插入的频率。与使用果蝇和人类模拟数据的现有工具相比,TEMP2 在检测生殖系插入方面具有较高的灵敏度和精度。此外,即使在模拟数据中存在高水平的嵌合体读取,TEMP2 也可以准确评估从头转座子插入的频率;这种嵌合体读取通常发生在短读测序文库的构建过程中。通过将 TEMP2 应用于受去抑制 P 元素影响的杂种杂种果蝇的已发表数据,我们在杂种后代重新获得 P 元素抑制的 piRNA 之前,证实了 P 元素的连续新插入。TEMP2 可在 Github 上免费获得:https://github.com/weng-lab/TEMP2。