Ebinger Arnt, Fischer Susanne, Höper Dirk
Institute for Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493 Greifswald-Insel Riems, Mecklenburg-Western Pomerania, Germany.
Institute of Infectology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493 Greifswald-Insel Riems, Mecklenburg-Western Pomerania, Germany.
Comput Struct Biotechnol J. 2020 Dec 26;19:732-742. doi: 10.1016/j.csbj.2020.12.040. eCollection 2021.
Metagenomics is a powerful tool to identify novel or unexpected pathogens, since it is generic and relatively unbiased. The limit of detection (LOD) is a critical parameter for the routine application of methods in the clinical diagnostic context. Although attempts for the determination of LODs for metagenomics next-generation sequencing (mNGS) have been made previously, these were only applicable for specific target species in defined samples matrices. Therefore, we developed and validated a generalized probability-based model to assess the sample-specific LOD of mNGS experiments (LOD). Initial rarefaction analyses with datasets of Borna disease virus 1 human encephalitis cases revealed a stochastic behavior of virus read detection. Based on this, we transformed the Bernoulli formula to predict the minimal necessary dataset size to detect one virus read with a probability of 99%. We validated the formula with 30 datasets from diseased individuals, resulting in an accuracy of 99.1% and an average of 4.5 ± 0.4 viral reads found in the calculated minimal dataset size. We demonstrated by modeling the virus genome size, virus-, and total RNA-concentration that the main determinant of mNGS sensitivity is the virus-sample background ratio. The predicted LOD for the respective pathogenic virus in the datasets were congruent with the virus-concentration determined by RT-qPCR. Theoretical assumptions were further confirmed by correlation analysis of mNGS and RT-qPCR data from the samples of the analyzed datasets. This approach should guide standardization of mNGS application, due to the generalized concept of LOD.
宏基因组学是鉴定新的或意外病原体的有力工具,因为它具有通用性且相对无偏差。检测限(LOD)是临床诊断环境中方法常规应用的关键参数。尽管此前已尝试确定宏基因组学下一代测序(mNGS)的检测限,但这些仅适用于特定样本基质中的特定目标物种。因此,我们开发并验证了一种基于概率的通用模型,以评估mNGS实验的样本特异性检测限(LOD)。对博尔纳病病毒1型人类脑炎病例数据集进行的初始稀疏分析揭示了病毒读数检测的随机行为。基于此,我们对伯努利公式进行了变换,以预测检测到一条病毒读数的概率为99%时所需的最小数据集大小。我们用来自患病个体的30个数据集对该公式进行了验证,准确率为99.1%,在计算出的最小数据集大小中平均发现4.5±0.4条病毒读数。我们通过对病毒基因组大小、病毒和总RNA浓度进行建模表明,mNGS灵敏度的主要决定因素是病毒与样本的背景比率。数据集中相应致病病毒的预测检测限与通过RT-qPCR确定的病毒浓度一致。通过对分析数据集中样本的mNGS和RT-qPCR数据进行相关性分析,进一步证实了理论假设。由于检测限的通用概念,这种方法应指导mNGS应用的标准化。