Suppr超能文献

存在母体细胞污染时的准确胎儿变异调用。

Accurate fetal variant calling in the presence of maternal cell contamination.

机构信息

Skolkovo Institute of Science and Technology, Skolkovo, Russia.

Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia.

出版信息

Eur J Hum Genet. 2020 Nov;28(11):1615-1623. doi: 10.1038/s41431-020-0697-6. Epub 2020 Jul 29.

Abstract

High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods "learn" the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.

摘要

高通量测序的胎儿 DNA 是一种很有前途的和日益普遍的方法,发现所有(或所有编码)的遗传变异在胎儿,无论是作为产前筛查或诊断的一部分,或为自然流产的遗传诊断。在许多情况下,胎儿的 DNA (从绒毛膜,羊水,或流产组织)可能与母体细胞的污染,导致胎儿和母体 DNA 的混合物。这种母细胞污染(MCC)破坏了传统的变异调用者的假设,即每个等位基因在杂合位点平均由 50%的读数,因此可能导致错误的基因型调用。我们提出了一个面板的方法来减少在 MCC 的存在下的基因分型错误。所有的方法都从测序数据的 GATK HaplotypeCaller 的输出开始,对(污染)胎儿样本及其父母双方,并且还依赖于关于 MCC 分数的信息(其本身很容易从高通量测序数据估计)。这些方法中的第一种方法使用贝叶斯概率模型来校正 MCC 不知情的 HaplotypeCaller 产生的胎儿基因型调用。其他两种方法“从例子中学习”基因型校正模型。我们使用模拟污染的胎儿数据来训练和测试模型。使用测试集,我们表明,与原始的 MCC 不知情的 HaplotypeCaller 调用相比,所有三种方法都显著提高了准确性。然后,我们将性能最佳的方法应用于三个来自自然终止妊娠的绒毛膜活检样本。

相似文献

1
Accurate fetal variant calling in the presence of maternal cell contamination.存在母体细胞污染时的准确胎儿变异调用。
Eur J Hum Genet. 2020 Nov;28(11):1615-1623. doi: 10.1038/s41431-020-0697-6. Epub 2020 Jul 29.
10
Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data.校正DNA序列数据基因分型中的样本污染
Am J Hum Genet. 2015 Aug 6;97(2):284-90. doi: 10.1016/j.ajhg.2015.07.002. Epub 2015 Jul 30.

本文引用的文献

1
Bayesian model comparison for rare-variant association studies.贝叶斯模型比较在罕见变异关联研究中的应用。
Am J Hum Genet. 2021 Dec 2;108(12):2354-2367. doi: 10.1016/j.ajhg.2021.11.005. Epub 2021 Nov 24.
2
Strelka2: fast and accurate calling of germline and somatic variants.Strelka2:快速准确地调用种系和体细胞变异。
Nat Methods. 2018 Aug;15(8):591-594. doi: 10.1038/s41592-018-0051-x. Epub 2018 Jul 16.
5
Prevalence of maternal cell contamination in amniotic fluid samples.羊水样本中母体细胞污染的患病率。
J Matern Fetal Neonatal Med. 2017 Sep;30(17):2133-2137. doi: 10.1080/14767058.2016.1240162. Epub 2016 Oct 17.
8
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
9
Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data.校正DNA序列数据基因分型中的样本污染
Am J Hum Genet. 2015 Aug 6;97(2):284-90. doi: 10.1016/j.ajhg.2015.07.002. Epub 2015 Jul 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验