Obermeyer Fritz, Jankowiak Martin, Barkas Nikolaos, Schaffner Stephen F, Pyle Jesse D, Yurkovetskiy Lonya, Bosso Matteo, Park Daniel J, Babadi Mehrtash, MacInnis Bronwyn L, Luban Jeremy, Sabeti Pardis C, Lemieux Jacob E
Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA.
Pyro Committee, Linux AI & Data Foundation; 548 Market St San Francisco, California 94104.
medRxiv. 2022 Feb 16:2021.09.07.21263228. doi: 10.1101/2021.09.07.21263228.
Repeated emergence of SARS-CoV-2 variants with increased fitness necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR , a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization.
A Bayesian hierarchical model of all SARS-CoV-2 viral genomes predicts lineage fitness and identifies associated mutations.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)具有更高适应性的变种不断出现,这就需要快速检测和鉴定新的谱系。为满足这一需求,我们开发了PyR ,这是一种分层贝叶斯多项逻辑回归模型,可推断各病毒谱系在不同地理区域的相对流行率,检测流行率上升的谱系,并识别与适应性相关的突变。将PyR 应用于所有公开的SARS-CoV-2基因组,我们识别出许多增加适应性的替代突变,包括先前确定的刺突蛋白突变以及核衣壳蛋白和非结构蛋白中的许多非刺突蛋白突变。PyR 根据新谱系的突变特征预测其增长,在新谱系出现时识别出值得关注的病毒谱系,并对具有生物学和公共卫生意义的突变进行功能表征排序。
一种针对所有SARS-CoV-2病毒基因组的贝叶斯分层模型可预测谱系适应性并识别相关突变。