EAGLE：显式替代基因组似然评估器。

EAGLE: Explicit Alternative Genome Likelihood Evaluator.

作者信息

Kuo Tony, Frith Martin C, Sese Jun, Horton Paul

机构信息

Artificial Intelligence Research Center, AIST, 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.

AIST-Tokyo Tech RWBC-OIL, 2-12-1 Okayama, Meguro-ku, Tokyo, 152-8550, Japan.

出版信息

BMC Med Genomics. 2018 Apr 20;11(Suppl 2):28. doi: 10.1186/s12920-018-0342-1.

DOI:10.1186/s12920-018-0342-1

PMID:29697369

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5918433/

Abstract

BACKGROUND

Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc quality filtering methods are employed to produce more reliable lists of putative variants, but the resulting lists typically still include numerous false positives. Thus it would be desirable to be able to rigorously evaluate the degree to which each putative variant is supported by the data. Unfortunately, users who wish to do this, e.g. for the purpose of prioritizing validation experiments, have been faced with limited options.

RESULTS

Here we present EAGLE, a method for evaluating the degree to which sequencing data supports a given candidate genome variant. EAGLE incorporates candidate variants into explicit hypotheses about the individual's genome, and then computes the probability of the observed data (the sequencing reads) under each hypothesis. In comparison with methods which rely heavily on a particular alignment of the reads to the reference genome, EAGLE readily accounts for uncertainties that may arise from multi-mapping or local misalignment and uses the entire length of each read. We compared the scores assigned by several well-known variant callers to EAGLE for the task of ranking true putative variants on both simulated data and real genome sequencing based benchmarks. For indels, EAGLE obtained marked improvement on simulated data and a whole genome sequencing benchmark, and modest but statistically significant improvement on an exome sequencing benchmark.

CONCLUSIONS

EAGLE ranked true variants higher than the scores reported by the callers and can used to improve specificity in variant calling. EAGLE is freely available at https://github.com/tony-kuo/eagle .

摘要

背景

从单样本DNA测序数据中可靠地检测基因组变异，尤其是插入和缺失（indels），仍然具有挑战性，部分原因是将测序读数与参考基因组比对时存在固有的不确定性。在实践中，人们采用了各种临时的质量过滤方法来生成更可靠的假定变异列表，但生成的列表通常仍包含大量假阳性。因此，能够严格评估每个假定变异受数据支持的程度将是很有必要的。不幸的是，希望这样做的用户，例如为了对验证实验进行优先级排序，面临的选择有限。

结果

在此我们展示了EAGLE，一种用于评估测序数据支持给定候选基因组变异程度的方法。EAGLE将候选变异纳入关于个体基因组的明确假设中，然后计算每个假设下观察到的数据（测序读数）的概率。与严重依赖读数与参考基因组的特定比对的方法相比，EAGLE很容易考虑到多映射或局部错配可能产生的不确定性，并使用每个读数的全长。我们将几种知名变异检测工具分配的分数与EAGLE在模拟数据和基于真实基因组测序的基准上对真正假定变异进行排名的任务进行了比较。对于indels，EAGLE在模拟数据和全基因组测序基准上有显著改进，在外显子组测序基准上有适度但具有统计学意义的改进。

结论

EAGLE对真正变异的排名高于检测工具报告的分数，可用于提高变异检测的特异性。EAGLE可在https://github.com/tony - kuo/eagle上免费获取。

相似文献

EAGLE: Explicit Alternative Genome Likelihood Evaluator.EAGLE：显式替代基因组似然评估器。

BMC Med Genomics. 2018 Apr 20;11(Suppl 2):28. doi: 10.1186/s12920-018-0342-1.

One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies.一刀切并不适用——RefEditor：构建个性化二倍体参考基因组以改善下一代测序研究中的读段映射和基因型调用

PLoS Comput Biol. 2015 Aug 12;11(8):e1004448. doi: 10.1371/journal.pcbi.1004448. eCollection 2015 Aug.

Leveraging known genomic variants to improve detection of variants, especially close-by Indels.利用已知的基因组变异来提高变异的检测能力，特别是附近的 Indels。

Bioinformatics. 2018 Sep 1;34(17):2918-2926. doi: 10.1093/bioinformatics/bty183.

tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine.tarSVM：使用支持向量机提高基于微流控PCR的靶向新一代测序得出的变异检测准确性。

BMC Bioinformatics. 2016 Jun 10;17(1):233. doi: 10.1186/s12859-016-1108-4.

Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.利用基因型阵列数据比较多样本和单样本变异检测结果，并改进来自深度覆盖全基因组测序数据的变异检测集。

Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786.

Calling known variants and identifying new variants while rapidly aligning sequence data.在快速对齐序列数据的同时，调用已知变异体并识别新变异体。

J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.

Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响

BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.

An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.利用来自小型真核生物基因组的模拟读数对单核苷酸多态性假阳性原因的调查。

BMC Bioinformatics. 2015 Nov 11;16:382. doi: 10.1186/s12859-015-0801-z.

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器：用于基于定量、精确性筛选的变异调用流程的自动融合。

BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.

Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.重新校准Illumina短读长比对中的映射质量分数可改善低覆盖度测序数据中的单核苷酸多态性（SNP）检测结果。

PeerJ. 2020 Dec 7;8:e10501. doi: 10.7717/peerj.10501. eCollection 2020.

引用本文的文献

Oryza genome evolution through a tetraploid lens.从四倍体视角看水稻基因组的进化

Nat Genet. 2025 May;57(5):1287-1297. doi: 10.1038/s41588-025-02183-5. Epub 2025 Apr 28.

Immune evasion through mitochondrial transfer in the tumour microenvironment.肿瘤微环境中通过线粒体转移实现的免疫逃逸。

Nature. 2025 Feb;638(8049):225-236. doi: 10.1038/s41586-024-08439-0. Epub 2025 Jan 22.

Homoeologs in Allopolyploids: Navigating Redundancy as Both an Evolutionary Opportunity and a Technical Challenge-A Transcriptomics Perspective.异源多倍体中的同源基因：从转录组学角度看，既是进化机遇也是技术挑战的冗余性。

Genes (Basel). 2024 Jul 24;15(8):977. doi: 10.3390/genes15080977.

Subgenome evolutionary dynamics in allotetraploid ferns: insights from the gene expression patterns in the allotetraploid species (Thelypteridacea, Polypodiales).异源四倍体蕨类植物的亚基因组进化动态：来自异源四倍体物种（金星蕨科，水龙骨目）基因表达模式的见解

Front Plant Sci. 2024 Jan 9;14:1286320. doi: 10.3389/fpls.2023.1286320. eCollection 2023.

A low-coverage 3' RNA-seq to detect homeolog expression in polyploid wheat.一种低覆盖度3' RNA测序技术用于检测多倍体小麦中的同源基因表达。

NAR Genom Bioinform. 2023 Jul 12;5(3):lqad067. doi: 10.1093/nargab/lqad067. eCollection 2023 Sep.

Low impact of polyploidization on the transcriptome of synthetic allohexaploid wheat.多倍体化对合成异源六倍体小麦转录组的影响较小。

BMC Genomics. 2023 May 11;24(1):255. doi: 10.1186/s12864-023-09324-2.

SUP: a probabilistic framework to propagate genome sequence uncertainty, with applications.SUP：一个用于传播基因组序列不确定性的概率框架及其应用

NAR Genom Bioinform. 2023 Apr 24;5(2):lqad038. doi: 10.1093/nargab/lqad038. eCollection 2023 Jun.

ConanVarvar: a versatile tool for the detection of large syndromic copy number variation from whole-genome sequencing data.ConanVarvar：一种用于从全基因组测序数据中检测大型综合征拷贝数变异的多功能工具。

BMC Bioinformatics. 2023 Feb 15;24(1):49. doi: 10.1186/s12859-023-05154-x.

A Natural Low Phytic Acid Finger Millet Accession Significantly Improves Iron Bioavailability in Indian Women.一种天然低植酸黍稷品种显著提高了印度女性的铁生物利用率。

Front Nutr. 2022 Mar 24;8:791392. doi: 10.3389/fnut.2021.791392. eCollection 2021.

Gradual evolution of allopolyploidy in Arabidopsis suecica.拟南芥瑞典亚种异源多倍体的逐渐进化。

Nat Ecol Evol. 2021 Oct;5(10):1367-1381. doi: 10.1038/s41559-021-01525-w. Epub 2021 Aug 19.

本文引用的文献

On genomic repeats and reproducibility.关于基因组重复和可重复性。

Bioinformatics. 2016 Aug 1;32(15):2243-7. doi: 10.1093/bioinformatics/btw139. Epub 2016 Mar 11.

Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.评估九种体细胞变异检测工具在全外显子组测序和靶向深度测序数据中检测体细胞突变的性能

PLoS One. 2016 Mar 22;11(3):e0151664. doi: 10.1371/journal.pone.0151664. eCollection 2016.

Systematic comparison of variant calling pipelines using gold standard personal exome variants.使用金标准个人外显子变体对变异检测流程进行系统比较。

Sci Rep. 2015 Dec 7;5:17875. doi: 10.1038/srep17875.

An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.一种用于从群体规模的DNA序列数据中提取和优化变异体的高效且可扩展的分析框架。

Genome Res. 2015 Jun;25(6):918-25. doi: 10.1101/gr.176552.114. Epub 2015 Apr 16.

Unified representation of genetic variants.基因变异的统一表示

Bioinformatics. 2015 Jul 1;31(13):2202-4. doi: 10.1093/bioinformatics/btv112. Epub 2015 Feb 19.

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测：基因组分析工具包最佳实践流程

Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.

Reducing INDEL calling errors in whole genome and exome sequencing data.降低全基因组和外显子组测序数据中 INDEL 调用错误。

Genome Med. 2014 Oct 28;6(10):89. doi: 10.1186/s13073-014-0089-z. eCollection 2014.

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.整合基于图谱、组装和单倍型的方法以在临床测序应用中进行变异检测。

Nat Genet. 2014 Aug;46(8):912-918. doi: 10.1038/ng.3036. Epub 2014 Jul 13.

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。

Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.

Calreticulin gene exon 9 frameshift mutations in patients with thrombocytosis.血小板增多症患者中钙网蛋白基因外显子9移码突变

Leukemia. 2014 May;28(5):1152-4. doi: 10.1038/leu.2013.382. Epub 2013 Dec 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

EAGLE：显式替代基因组似然评估器。

EAGLE: Explicit Alternative Genome Likelihood Evaluator.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献