Suppr超能文献

用于复杂宿主内变异检测的 PySNV。

PySNV for complex intra-host variation detection.

机构信息

Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China.

University of Chinese Academy of Sciences, Beijing 100101, China.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae116.

Abstract

MOTIVATION

Intra-host variants refer to genetic variations or mutations that occur within an individual host organism. These variants are typically studied in the context of viruses, bacteria, or other pathogens to understand the evolution of pathogens. Moreover, intra-host variants are also explored in the field of tumor biology and mitochondrial biology to characterize somatic mutations and inherited heteroplasmic mutations. Intra-host variants can involve long insertions, deletions, and combinations of different mutation types, which poses challenges in their identification. The performance of current methods in detecting of complex intra-host variants is unknown.

RESULTS

First, we simulated a dataset comprising 10 samples with 1869 intra-host variants involving various mutation patterns and benchmarked current variant detection software. The results indicated that though current software can detect most variants with F1-scores between 0.76 and 0.97, their performance in detecting long indels and low frequency variants was limited. Thus, we developed a new software, PySNV, for the detection of complex intra-host variations. On the simulated dataset, PySNV successfully detected 1863 variant cases (F1-score: 0.99) and exhibited the highest Pearson correlation coefficient (PCC: 0.99) to the ground truth in predicting variant frequencies. The results demonstrated that PySNV delivered promising performance even for long indels and low frequency variants, while maintaining computational speed comparable to other methods. Finally, we tested its performance on SARS-CoV-2 replicate sequencing data and found that it reported 21% more variants compared to LoFreq, the best-performing benchmarked software, while showing higher consistency (62% over 54%) within replicates. The discrepancies mostly exist in low-depth regions and low frequency variants.

AVAILABILITY AND IMPLEMENTATION

https://github.com/bnuLyndon/PySNV/.

摘要

动机

宿主内变异是指发生在个体宿主生物体内的遗传变异或突变。这些变异通常在病毒、细菌或其他病原体的背景下进行研究,以了解病原体的进化。此外,宿主内变异也在肿瘤生物学和线粒体生物学领域中得到了探索,以描述体细胞突变和遗传异质性突变。宿主内变异可能涉及长插入、缺失和不同突变类型的组合,这给它们的鉴定带来了挑战。目前方法在检测复杂宿主内变异方面的性能尚不清楚。

结果

首先,我们模拟了一个包含 10 个样本的数据集,其中有 1869 个涉及各种突变模式的宿主内变异,并对当前的变异检测软件进行了基准测试。结果表明,尽管当前的软件可以检测到大多数变异,F1 分数在 0.76 到 0.97 之间,但它们在检测长插入缺失和低频变异方面的性能有限。因此,我们开发了一种新的软件 PySNV,用于检测复杂的宿主内变异。在模拟数据集上,PySNV 成功检测到 1863 个变异案例(F1 分数:0.99),并在预测变异频率方面表现出与真实情况最高的皮尔逊相关系数(PCC:0.99)。结果表明,即使对于长插入缺失和低频变异,PySNV 也能提供有前景的性能,同时保持与其他方法相当的计算速度。最后,我们在 SARS-CoV-2 复制测序数据上测试了它的性能,发现与表现最好的基准软件 LoFreq 相比,它报告了 21%的更多变异,而在重复样本之间的一致性更高(62%比 54%)。差异主要存在于低深度区域和低频变异中。

可用性和实现

https://github.com/bnuLyndon/PySNV/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/133a/10937218/b6b8c4267b0e/btae116f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验