Suppr超能文献

一种从单核苷酸多态性(SNP)数据估计单倍型频率、流行率以及感染复数的最大似然法。

A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.

作者信息

Tsoungui Obama Henri Christian Junior, Schneider Kristan Alexander

机构信息

Department of Applied Computer- and Biosciences, University of Applied Sciences Mittweida, Mittweida, Germany.

出版信息

Front Epidemiol. 2022 Sep 23;2:943625. doi: 10.3389/fepid.2022.943625. eCollection 2022.

Abstract

The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in and malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.

摘要

基因组方法的引入促进了标准化分子疾病监测。例如,疟疾中的单核苷酸多态性(SNP)条形码可用于表征单倍型、其频率和流行情况,以揭示时间和空间传播模式。一个混杂因素是同一感染中存在多个遗传上不同的病原体变体,即感染复数(MOI)。像通常在方法中那样忽略模糊信息会导致估计的可信度降低且有偏差。我们引入一个统计框架,从疟疾SNP数据(即多个双等位基因标记位点)中获得单倍型频率、流行率以及MOI的最大似然估计(MLE)。模型参数的数量随着所考虑的遗传标记数量呈几何增长,并且MLE不存在封闭形式的解。因此,MLE需要通过数值方法推导得出。我们使用期望最大化(EM)算法来推导最大似然估计,这是一种高效且易于实现的算法,能产生数值稳定的解。我们还基于所有或仅明确的遗传信息推导了单倍型流行率的表达式,并比较了这两种方法。后者对应于流行率的有偏估计。我们通过假设实际样本量和各种传播强度场景的系统数值模拟来评估我们估计器的性能。对于合理的样本量和位点数量,该方法偏差较小。例如,我们将该方法应用于喀麦隆关于恶性疟原虫对磺胺多辛 - 乙胺嘧啶耐药性的数据集。该方法不限于疟疾,可应用于任何具有类似传播行为的传染病。提供了该方法作为R脚本的易于使用的实现方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6874/10911023/368c92759929/fepid-02-943625-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验