Suppr超能文献

利用McClintock 2对转座元件检测器进行可重复评估,可准确推断酵母中Ty插入模式。

Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast.

作者信息

Chen Jingxuan, Basting Preston J, Han Shunhua, Garfinkel David J, Bergman Casey M

机构信息

Institute of Bioinformatics, University of Georgia, Athens, GA.

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA.

出版信息

bioRxiv. 2023 Mar 21:2023.02.13.528343. doi: 10.1101/2023.02.13.528343.

Abstract

BACKGROUND

Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.

RESULTS

We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∼1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia-superfamily retrotransposons in yeast.

CONCLUSION

McClintock (https://github.com/bergmanlab/mcclintock/) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.

摘要

背景

已经开发了许多计算方法,用于使用短读长全基因组测序数据检测非参考转座元件(TE)插入。这些方法的多样性和复杂性常常给试图可重复地安装、执行或评估多个TE插入检测器的新用户带来挑战。

结果

我们之前开发了McClintock元管道,以方便六个第一代短读长TE检测器的安装、执行和评估。在此,我们报告了一个使用Snakemake和Conda以Python编写的完全重新实现的McClintock版本,它改进了安装、错误处理、速度、稳定性和可扩展性。McClintock 2现在包括12个短读长TE检测器、辅助预处理和分析模块、交互式HTML报告以及一个模拟框架,用于可重复地评估组件TE检测器的准确性。当应用于模式微生物真核生物酿酒酵母时,我们发现McClintock 2组件识别非参考TE插入精确位置的能力存在很大差异,RelocaTE2在模拟数据中显示出最高的召回率和精确率。我们发现,通过基于覆盖度的丰度估计和tRNA启动子靶向的预期模式评估,RelocaTE2、TEMP、TEMP2和TEBreak在约1000个酵母基因组的全物种面板中提供了关于非参考TE插入的一致且具有生物学意义的视图。最后,我们表明,酵母中的一流预测器具有足够的分辨率,可揭示酵母tRNA基因上游核小体结合区域中Ty1、Ty2和Ty4的二元整合模式,这使我们能够将最初通过实验揭示的关于Ty1的精细尺度靶标偏好的知识扩展到酵母中的自然插入和相关的copia超家族逆转座子。

结论

McClintock(https://github.com/bergmanlab/mcclintock/)提供了一个用户友好的管道,用于使用多个TE检测器在短读长全基因组测序数据中鉴定TE,这将使研究广泛不同生物体中TE插入变异的研究人员受益。将改进后的McClintock系统应用于模拟和实证酵母基因组数据,揭示了一流的方法以及对研究最广泛的模式真核生物之一的新生物学见解,并为评估和选择其他物种的非参考TE检测器提供了范例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/10037708/8029c3dc6c26/nihpp-2023.02.13.528343v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验