Suppr超能文献

SARS-CoV-2 株系进化关系的聚类分析。

Clustering analysis for the evolutionary relationships of SARS-CoV-2 strains.

机构信息

School of Computer Science, Shaanxi Normal University, Xian, 710119, China.

College of Life Sciences, Shaanxi Normal University, Xian, 710119, China.

出版信息

Sci Rep. 2024 Mar 18;14(1):6428. doi: 10.1038/s41598-024-57001-5.

Abstract

To explore the differences and relationships between the available SARS-CoV-2 strains and predict the potential evolutionary direction of these strains, we employ the hierarchical clustering analysis to investigate the evolutionary relationships between the SARS-CoV-2 strains utilizing the genomic sequences collected in China till January 7, 2023. We encode the sequences of the existing SARS-CoV-2 strains into numerical data through k-mer algorithm, then propose four methods to select the representative sample from each type of strains to comprise the dataset for clustering analysis. Three hierarchical clustering algorithms named Ward-Euclidean, Ward-Jaccard, and Average-Euclidean are introduced through combing the Euclidean and Jaccard distance with the Ward and Average linkage clustering algorithms embedded in the OriginPro software. Experimental results reveal that BF.28, BE.1.1.1, BA.5.3, and BA.5.6.4 strains exhibit distinct characteristics which are not observed in other types of SARS-CoV-2 strains, suggesting their being the majority potential sources which the future SARS-CoV-2 strains' evolution from. Moreover, BA.2.75, CH.1.1, BA.2, BA.5.1.3, BF.7, and B.1.1.214 strains demonstrate enhanced abilities in terms of immune evasion, transmissibility, and pathogenicity. Hence, closely monitoring the evolutionary trends of these strains is crucial to mitigate their impact on public health and society as far as possible.

摘要

为了探究现有 SARS-CoV-2 毒株之间的差异和关系,并预测这些毒株的潜在进化方向,我们采用层次聚类分析方法,利用截至 2023 年 1 月 7 日在中国收集的基因组序列,研究 SARS-CoV-2 毒株之间的进化关系。我们通过 k-mer 算法将现有 SARS-CoV-2 毒株的序列编码为数值数据,然后提出了四种方法从每种类型的毒株中选择代表样本,组成用于聚类分析的数据集。通过将欧几里得距离和杰卡德距离与 OriginPro 软件中嵌入的 Ward 和 Average 链接聚类算法相结合,引入了三种层次聚类算法,分别是 Ward-Euclidean、Ward-Jaccard 和 Average-Euclidean。实验结果表明,BF.28、BE.1.1.1、BA.5.3 和 BA.5.6.4 株具有其他 SARS-CoV-2 株系所不具有的独特特征,表明它们是未来 SARS-CoV-2 株系进化的主要潜在来源。此外,BA.2.75、CH.1.1、BA.2、BA.5.1.3、BF.7 和 B.1.1.214 株在免疫逃逸、传染性和致病性方面表现出更强的能力。因此,密切监测这些株系的进化趋势对于尽可能减轻它们对公共卫生和社会的影响至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83a1/10948388/f804602eca42/41598_2024_57001_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验