• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SeqEM:一种适用于下一代测序研究的自适应基因型调用方法。

SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies.

机构信息

John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA.

出版信息

Bioinformatics. 2010 Nov 15;26(22):2803-10. doi: 10.1093/bioinformatics/btq526. Epub 2010 Sep 21.

DOI:10.1093/bioinformatics/btq526
PMID:20861027
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2971572/
Abstract

MOTIVATION

Next-generation sequencing presents several statistical challenges, with one of the most fundamental being determining an individual's genotype from multiple aligned short read sequences at a position. Some simple approaches for genotype calling apply fixed filters, such as calling a heterozygote if more than a specified percentage of the reads have variant nucleotide calls. Other genotype-calling methods, such as MAQ and SOAPsnp, are implementations of Bayes classifiers in that they classify genotypes using posterior genotype probabilities.

RESULTS

Here, we propose a novel genotype-calling algorithm that, in contrast to the other methods, estimates parameters underlying the posterior probabilities in an adaptive way rather than arbitrarily specifying them a priori. The algorithm, which we call SeqEM, applies the well-known Expectation-Maximization algorithm to an appropriate likelihood for a sample of unrelated individuals with next-generation sequence data, leveraging information from the sample to estimate genotype probabilities and the nucleotide-read error rate. We demonstrate using analytic calculations and simulations that SeqEM results in genotype-call error rates as small as or smaller than filtering approaches and MAQ. We also apply SeqEM to exome sequence data in eight related individuals and compare the results to genotypes from an Illumina SNP array, showing that SeqEM behaves well in real data that deviates from idealized assumptions.

CONCLUSION

SeqEM offers an improved, robust and flexible genotype-calling approach that can be widely applied in the next-generation sequencing studies.

AVAILABILITY AND IMPLEMENTATION

Software for SeqEM is freely available from our website: www.hihg.org under Software Download.

摘要

动机

下一代测序技术提出了几个统计挑战,其中最基本的一个是从一个位置的多个对齐的短读序列中确定个体的基因型。一些简单的基因型调用方法应用固定的过滤器,例如,如果超过指定百分比的读具有变异核苷酸调用,则将杂合子调用。其他基因型调用方法,如 MAQ 和 SOAPsnp,是贝叶斯分类器的实现,因为它们使用后验基因型概率对基因型进行分类。

结果

在这里,我们提出了一种新的基因型调用算法,与其他方法不同,它以自适应的方式估计后验概率的参数,而不是任意地先验指定它们。该算法,我们称之为 SeqEM,将著名的期望最大化算法应用于具有下一代序列数据的无关个体样本的适当似然,利用来自样本的信息来估计基因型概率和核苷酸读取错误率。我们通过分析计算和模拟证明,SeqEM 导致的基因型调用错误率与过滤方法和 MAQ 一样小或更小。我们还将 SeqEM 应用于八个相关个体的外显子序列数据,并将结果与来自 Illumina SNP 数组的基因型进行比较,表明 SeqEM 在偏离理想化假设的真实数据中表现良好。

结论

SeqEM 提供了一种改进的、稳健的和灵活的基因型调用方法,可广泛应用于下一代测序研究中。

可用性和实现

SeqEM 的软件可在我们的网站上免费获得:www.hihg.org 下的软件下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/6b1282be2a38/btq526f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/acf956222cca/btq526f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/c7c5824132fd/btq526f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/f1d419867abe/btq526f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/dcc8e9841ccd/btq526f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/6b1282be2a38/btq526f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/acf956222cca/btq526f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/c7c5824132fd/btq526f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/f1d419867abe/btq526f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/dcc8e9841ccd/btq526f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf2b/2971572/6b1282be2a38/btq526f5.jpg

相似文献

1
SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies.SeqEM:一种适用于下一代测序研究的自适应基因型调用方法。
Bioinformatics. 2010 Nov 15;26(22):2803-10. doi: 10.1093/bioinformatics/btq526. Epub 2010 Sep 21.
2
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
3
Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes.外显子组测序一致性基因分型器(CGES):提高外显子组变异基因型的质量
Bioinformatics. 2015 Jan 15;31(2):187-93. doi: 10.1093/bioinformatics/btu591. Epub 2014 Sep 29.
4
iCall: a genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.iCall:一种用于 Illumina 外显子组阵列上罕见、低频和常见变异的基因型调用算法。
Bioinformatics. 2014 Jun 15;30(12):1714-20. doi: 10.1093/bioinformatics/btu107. Epub 2014 Feb 23.
5
Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.利用改良的简化代表性测序和 SNP 调用算法的直接比较,生成猩猩群体基因组学的 SNP 数据集。
BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.
6
SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays.SNiPer-HD:通过用于高密度单核苷酸多态性(SNP)阵列的期望最大化算法提高基因型分型准确性。
Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.
7
One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies.一刀切并不适用——RefEditor:构建个性化二倍体参考基因组以改善下一代测序研究中的读段映射和基因型调用
PLoS Comput Biol. 2015 Aug 12;11(8):e1004448. doi: 10.1371/journal.pcbi.1004448. eCollection 2015 Aug.
8
A genotype calling algorithm for the Illumina BeadArray platform.一种适用于Illumina BeadArray平台的基因型分型算法。
Bioinformatics. 2007 Oct 15;23(20):2741-6. doi: 10.1093/bioinformatics/btm443. Epub 2007 Sep 10.
9
Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
10
Joint genotype calling with array and sequence data.联合使用阵列和序列数据进行基因型调用。
Genet Epidemiol. 2012 Sep;36(6):527-37. doi: 10.1002/gepi.21657. Epub 2012 Jul 20.

引用本文的文献

1
NGS allele counts versus called genotypes for testing genetic association.用于检测基因关联的二代测序等位基因计数与分型结果对比
Comput Struct Biotechnol J. 2022 Jul 11;20:3729-3733. doi: 10.1016/j.csbj.2022.07.016. eCollection 2022.
2
The Future of Livestock Management: A Review of Real-Time Portable Sequencing Applied to Livestock.未来的畜牧业管理:实时便携式测序在畜牧业中的应用综述。
Genes (Basel). 2020 Dec 9;11(12):1478. doi: 10.3390/genes11121478.
3
A genome assembly and the somatic genetic and epigenetic mutation rate in a wild long-lived perennial Populus trichocarpa.

本文引用的文献

1
Exome sequencing of a multigenerational human pedigree.外显子组测序一个多代人家系。
PLoS One. 2009 Dec 14;4(12):e8232. doi: 10.1371/journal.pone.0008232.
2
Exome sequencing identifies the cause of a mendelian disorder.外显子组测序确定了一种孟德尔疾病的病因。
Nat Genet. 2010 Jan;42(1):30-5. doi: 10.1038/ng.499. Epub 2009 Nov 13.
3
Massively parallel sequencing: the next big thing in genetic medicine.大规模平行测序:基因医学的下一个重大突破。
一个野生长寿多年生杨树的基因组组装以及体细胞遗传和表观遗传突变率。
Genome Biol. 2020 Oct 6;21(1):259. doi: 10.1186/s13059-020-02162-5.
4
Quantitative Trait Loci for Freezing Tolerance in a Lowland x Upland Switchgrass Population.低地×高地柳枝稷群体中耐冻性的数量性状位点
Front Plant Sci. 2019 Mar 29;10:372. doi: 10.3389/fpls.2019.00372. eCollection 2019.
5
Genomic Prediction for Winter Survival of Lowland Switchgrass in the Northern USA.美国北部低地柳枝稷冬季生存的基因组预测。
G3 (Bethesda). 2019 Jun 5;9(6):1921-1931. doi: 10.1534/g3.119.400094.
6
Extensions of BLUP Models for Genomic Prediction in Heterogeneous Populations: Application in a Diverse Switchgrass Sample.异质群体中用于基因组预测的BLUP模型扩展:在多样化柳枝稷样本中的应用
G3 (Bethesda). 2019 Mar 7;9(3):789-805. doi: 10.1534/g3.118.200969.
7
A comprehensive overview of genomic imprinting in breast and its deregulation in cancer.全面综述基因组印迹在乳腺中的作用及其在癌症中的失调。
Nat Commun. 2018 Oct 8;9(1):4120. doi: 10.1038/s41467-018-06566-7.
8
Genome-Wide Association Study in Pseudo-F Populations of Switchgrass Identifies Genetic Loci Affecting Heading and Anthesis Dates.柳枝稷伪F群体的全基因组关联研究确定了影响抽穗期和开花期的基因座。
Front Plant Sci. 2018 Sep 13;9:1250. doi: 10.3389/fpls.2018.01250. eCollection 2018.
9
Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing.有系统测序差异的下一代测序数据中群体结构的稳健推断
Bioinformatics. 2018 Apr 1;34(7):1157-1163. doi: 10.1093/bioinformatics/btx708.
10
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
Am J Hum Genet. 2009 Aug;85(2):142-54. doi: 10.1016/j.ajhg.2009.06.022.
4
Multiplex padlock targeted sequencing reveals human hypermutable CpG variations.多重锁式靶向测序揭示人类高突变CpG变异。
Genome Res. 2009 Sep;19(9):1606-15. doi: 10.1101/gr.092213.109. Epub 2009 Jun 12.
5
SNP detection for massively parallel whole-genome resequencing.用于大规模平行全基因组重测序的单核苷酸多态性检测
Genome Res. 2009 Jun;19(6):1124-32. doi: 10.1101/gr.088013.108. Epub 2009 May 6.
6
Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
7
Simple and efficient analysis of disease association with missing genotype data.对存在缺失基因型数据的疾病关联进行简单有效的分析。
Am J Hum Genet. 2008 Feb;82(2):444-52. doi: 10.1016/j.ajhg.2007.11.004.
8
Multiplex amplification of large sets of human exons.大量人类外显子的多重扩增。
Nat Methods. 2007 Nov;4(11):931-6. doi: 10.1038/nmeth1110. Epub 2007 Oct 14.
9
A new multipoint method for genome-wide association studies by imputation of genotypes.一种通过基因型插补进行全基因组关联研究的新的多点方法。
Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.
10
Score tests for association between traits and haplotypes when linkage phase is ambiguous.当连锁相不明确时,性状与单倍型之间关联的计分检验。
Am J Hum Genet. 2002 Feb;70(2):425-34. doi: 10.1086/338688. Epub 2001 Dec 27.