Mostefai Fatima, Grenier Jean-Christophe, Poujol Raphaël, Hussin Julie
Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Québec, Canada.
Research Center, Montreal Heart Institute, Québec, Canada.
NAR Genom Bioinform. 2024 Nov 12;6(4):lqae145. doi: 10.1093/nargab/lqae145. eCollection 2024 Sep.
Understanding viral genome evolution during host infection is crucial for grasping viral diversity and evolution. Analyzing intra-host single nucleotide variants (iSNVs) offers insights into new lineage emergence, which is important for predicting and mitigating future viral threats. Despite next-generation sequencing's potential, challenges persist, notably sequencing artifacts leading to false iSNVs. We developed a workflow to enhance iSNV detection in large NGS libraries, using over 130 000 SARS-CoV-2 libraries to distinguish mutations from errors. Our approach integrates bioinformatics protocols, stringent quality control, and dimensionality reduction to tackle batch effects and improve mutation detection reliability. Additionally, we pioneer the application of the PHATE visualization approach to genomic data and introduce a methodology that quantifies how related groups of data points are represented within a two-dimensional space, enhancing clustering structure explanation based on genetic similarities. This workflow advances accurate intra-host mutation detection, facilitating a deeper understanding of viral diversity and evolution.
了解病毒在宿主感染过程中的基因组进化对于掌握病毒的多样性和进化至关重要。分析宿主内单核苷酸变异(iSNV)有助于洞察新谱系的出现,这对于预测和减轻未来的病毒威胁很重要。尽管下一代测序具有潜力,但挑战依然存在,尤其是测序假象会导致错误的iSNV。我们开发了一种工作流程,以增强在大型NGS文库中iSNV的检测,使用超过130000个SARS-CoV-2文库来区分突变和错误。我们的方法整合了生物信息学协议、严格的质量控制和降维,以解决批次效应并提高突变检测的可靠性。此外,我们率先将PHATE可视化方法应用于基因组数据,并引入了一种方法来量化二维空间中相关数据点组的表示方式,增强基于遗传相似性的聚类结构解释。此工作流程推动了宿主内突变的准确检测,有助于更深入地了解病毒的多样性和进化。