Suppr超能文献

一种基于模型的聚类方法,用于从序列变异中检测传染病传播暴发。

A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

作者信息

McCloskey Rosemary M, Poon Art F Y

机构信息

BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada.

Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada.

出版信息

PLoS Comput Biol. 2017 Nov 13;13(11):e1005868. doi: 10.1371/journal.pcbi.1005868. eCollection 2017 Nov.

Abstract

Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources.

摘要

通过基因相似性对感染进行聚类是识别传染病潜在暴发的一种常用技术,部分原因是现在许多感染的临床管理中常规收集序列。为此已经开发了多种非参数聚类方法。这些方法通常直观、计算速度快,并且能够轻松处理大数据集。然而,我们发现非参数聚类方法可能偏向于识别诊断聚类(即感染后个体采样较早的聚类),而不是旨在作为公共卫生工作潜在重点的快速传播聚类。我们基于拟合马尔可夫调制泊松过程(MMPP)开发了一种全新的基因聚类方法,该过程表示不同感染之间关系树上传播率的演变。我们使用模拟和实际的HIV序列数据集,将这种基于模型的方法与五种非参数聚类方法进行了评估。对于快速传播的模拟聚类,MMPP聚类方法比非参数方法获得了更高的平均灵敏度(85%)和特异性(91%)。当我们将这些聚类方法应用于美国西雅图一项关于HIV-1基因聚类研究的已发表序列时,我们发现与其他方法相比,MMPP方法将约一半(46%)的个体归类到聚类中。此外,使用MMPP提取的聚类中,近似传播率的平均内部分支长度明显更短,而其他方法则不然。我们确定MMPP方法的计算时间与树大小呈线性比例,在单台计算机上,对于1000个末端的树大约需要30秒,对于50000个末端的树大约需要20分钟。这种新的基因聚类方法对病原体序列分析在公共卫生中的应用具有重要意义,在公共卫生中,稳健且准确地识别聚类对于以最具成本效益的方式部署疫情管理和预防资源至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9b3/5703573/3bf718b35ae7/pcbi.1005868.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验