Suppr超能文献

用于在短读长基因组测序数据中检测人类内源性逆转录病毒插入的生物信息学工具评估。

An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data.

作者信息

Bowles Harry, Kabiljo Renata, Al Khleifat Ahmad, Jones Ashley, Quinn John P, Dobson Richard J B, Swanson Chad M, Al-Chalabi Ammar, Iacoangeli Alfredo

机构信息

Department of Basic and Clinical Neuroscience, King's College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom.

Department of Biostatistics and Health Informatics, King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom.

出版信息

Front Bioinform. 2023 Feb 8;2:1062328. doi: 10.3389/fbinf.2022.1062328. eCollection 2022.

Abstract

There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.

摘要

鉴于大量证据表明人类内源性逆转录病毒(HERV)与许多人类疾病有关,对其研究的兴趣日益浓厚。尽管对它们的基因组特征进行描述存在众多技术挑战,但下一代测序(NGS)已显示出检测人类中HERV插入及其多态性的潜力。目前,存在许多用于在短读长NGS数据中检测它们的计算工具。为了设计最佳分析流程,需要对现有工具进行独立评估。我们使用各种实验设计和数据集评估了一组此类工具的性能。这些包括50个人类短读长全基因组测序样本、匹配的长读长和短读长测序数据以及模拟的短读长NGS数据。我们的结果突出了这些工具在不同数据集上的巨大性能差异,并表明不同的工具可能适用于不同的研究设计。然而,专门设计用于专门检测人类内源性逆转录病毒的工具始终优于检测更广泛转座元件的通用工具。我们建议,如果有足够的计算资源,使用多种HERV检测工具以获得一组一致的插入位点可能是理想的。此外,鉴于这些工具的假阳性发现率在不同工具和数据集之间在8%至55%之间变化,如果有DNA样本,我们建议对预测的插入进行湿实验室验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48d2/9945273/3d54390492ba/fbinf-02-1062328-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验