• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无参考 SNP 调用:通过防止重复基因组区域的错误调用来提高准确性。

Reference-free SNP calling: improved accuracy by preventing incorrect calls from repetitive genomic regions.

机构信息

Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China.

出版信息

Biol Direct. 2012 Jun 8;7:17. doi: 10.1186/1745-6150-7-17.

DOI:10.1186/1745-6150-7-17
PMID:22682067
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3472322/
Abstract

BACKGROUND

Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome.

RESULTS

Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents.

CONCLUSIONS

The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome.

摘要

背景

单核苷酸多态性(SNPs)是真核生物基因组中最丰富的遗传变异类型,最近已成为广泛的生态和进化研究中首选的标记。下一代测序(NGS)技术的出现使得在没有或有限基因组资源的非模式生物中高效地对大量 SNPs 进行基因分型成为可能。然而,大多数基于 NGS 的基因分型方法都需要参考基因组来进行准确的 SNP 调用。然而,在没有参考基因组的情况下,很少有努力致力于开发或改进准确 SNP 调用的算法。

结果

在这里,我们描述了一种改进的最大似然(ML)算法,称为 iML,它可以在没有参考基因组的非模式生物中实现 SNP 调用的高基因分型准确性。iML 算法结合了混合泊松/正态模型来检测复合读取簇,并可以有效地防止由于重复基因组区域而导致的不正确的 SNP 调用。通过对模拟和真实测序数据集的分析,我们证明与 ML 或阈值方法相比,iML 可以显著提高从头 SNP 基因分型的准确性,并且对于具有高重复含量的二倍体基因组的无参考基因分型特别有效。

结论

iML 算法可以有效地防止由于重复基因组区域而导致的不正确的 SNP 调用,从而通过实现更高的基因分型准确性而优于原始 ML 算法。因此,我们的算法非常适合在没有参考基因组的非模式生物中进行准确的从头 SNP 基因分型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/0b3c9c321c07/1745-6150-7-17-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/a5b74ee7412d/1745-6150-7-17-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/a7da01810e3a/1745-6150-7-17-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/42e098f7ce55/1745-6150-7-17-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/d78c812244da/1745-6150-7-17-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/0b3c9c321c07/1745-6150-7-17-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/a5b74ee7412d/1745-6150-7-17-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/a7da01810e3a/1745-6150-7-17-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/42e098f7ce55/1745-6150-7-17-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/d78c812244da/1745-6150-7-17-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee0/3472322/0b3c9c321c07/1745-6150-7-17-5.jpg

相似文献

1
Reference-free SNP calling: improved accuracy by preventing incorrect calls from repetitive genomic regions.无参考 SNP 调用:通过防止重复基因组区域的错误调用来提高准确性。
Biol Direct. 2012 Jun 8;7:17. doi: 10.1186/1745-6150-7-17.
2
Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.利用改良的简化代表性测序和 SNP 调用算法的直接比较,生成猩猩群体基因组学的 SNP 数据集。
BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.
3
Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies.基于测序基因分型(GBS)数据的全基因组单核苷酸多态性(SNP)检测:七种流程和两种测序技术的比较
PLoS One. 2016 Aug 22;11(8):e0161333. doi: 10.1371/journal.pone.0161333. eCollection 2016.
4
LinkImputeR: user-guided genotype calling and imputation for non-model organisms.LinkImputeR:用于非模式生物的用户引导基因型分型和填补。
BMC Genomics. 2017 Jul 10;18(1):523. doi: 10.1186/s12864-017-3873-5.
5
A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy.对低覆盖作物数据集的测序分析方法的比较表明,一种新的工作流程 GB-eaSy 具有优势。
BMC Bioinformatics. 2017 Dec 28;18(1):586. doi: 10.1186/s12859-017-2000-6.
6
De novo construction of a "Gene-space" for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources.利用有限计算资源,通过下一代测序(NGS)读数的提取和组装迭代过程(iPEA方案),对富含重复序列的二倍体植物基因组进行“基因空间”的从头构建。
BMC Res Notes. 2016 Feb 11;9:81. doi: 10.1186/s13104-016-1903-z.
7
Fast-GBS: a new pipeline for the efficient and highly accurate calling of SNPs from genotyping-by-sequencing data.Fast-GBS:一种用于从测序基因分型数据中高效且高精度地调用单核苷酸多态性(SNP)的新流程。
BMC Bioinformatics. 2017 Jan 3;18(1):5. doi: 10.1186/s12859-016-1431-9.
8
GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data.GBS-SNP-CROP:一种用于单核苷酸多态性(SNP)发现和植物种质特征分析的无参考序列流程,使用可变长度的双端测序基因分型数据。
BMC Bioinformatics. 2016 Jan 12;17:29. doi: 10.1186/s12859-016-0879-y.
9
Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.扫描与填充:结合简化基因组测序、SNP芯片和全基因组重测序数据的超密集SNP基因分型
PLoS One. 2015 Jul 10;10(7):e0131533. doi: 10.1371/journal.pone.0131533. eCollection 2015.
10
Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.基因组多样性影响细菌单核苷酸多态性 calling 管道的准确性。
Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa007.

引用本文的文献

1
Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication.由于同源多倍体、重复元件和重复导致全基因组测序中广泛存在的杂合性偏离模式。
Genome Biol Evol. 2023 Dec 1;15(12). doi: 10.1093/gbe/evad229.
2
Polygenic signals of sex differences in selection in humans from the UK Biobank.多基因信号表明人类的性选择存在性别差异,该研究来自英国生物库。
PLoS Biol. 2022 Sep 6;20(9):e3001768. doi: 10.1371/journal.pbio.3001768. eCollection 2022 Sep.
3
Genotyping by Sequencing Advancements in Barley.

本文引用的文献

1
2b-RAD: a simple and flexible method for genome-wide genotyping.2b-RAD:一种简单灵活的全基因组基因分型方法。
Nat Methods. 2012 May 20;9(8):808-10. doi: 10.1038/nmeth.2023.
2
Stacks: building and genotyping Loci de novo from short-read sequences.Stacks:从头测序短读长序列中构建和基因分型新基因座。
G3 (Bethesda). 2011 Aug;1(3):171-82. doi: 10.1534/g3.111.000240. Epub 2011 Aug 1.
3
Genome-wide genetic marker discovery and genotyping using next-generation sequencing.利用下一代测序进行全基因组遗传标记发现和基因分型。
大麦测序基因分型进展
Front Plant Sci. 2022 Aug 8;13:931423. doi: 10.3389/fpls.2022.931423. eCollection 2022.
4
Haploid, diploid, and pooled exome capture recapitulate features of biology and paralogy in two non-model tree species.单体型、二倍体和合并外显子捕获再现了两种非模式树种生物学和同源性特征。
Mol Ecol Resour. 2022 Jan;22(1):225-238. doi: 10.1111/1755-0998.13474. Epub 2021 Aug 14.
5
Genome survey and high-resolution genetic map provide valuable genetic resources for Fenneropenaeus chinensis.基因组调查和高分辨率遗传图谱为中国明对虾提供了有价值的遗传资源。
Sci Rep. 2021 Apr 6;11(1):7533. doi: 10.1038/s41598-021-87237-4.
6
Segregation distortion: high genetic load suggested by a Chinese shrimp family under high-intensity selection.分离失真:高强度选择下中国对虾家系的高遗传负荷。
Sci Rep. 2020 Dec 11;10(1):21820. doi: 10.1038/s41598-020-78389-w.
7
Genome Wide Identification of Mutational Hotspots in the Apicomplexan Parasite Neospora caninum and the Implications for Virulence.全基因组鉴定顶复门寄生虫新孢子虫中的突变热点及其对毒力的影响。
Genome Biol Evol. 2018 Sep 1;10(9):2417-2431. doi: 10.1093/gbe/evy188.
8
PMERGE: Computational filtering of paralogous sequences from RAD-seq data.PMERGE:从RAD-seq数据中对旁系同源序列进行计算过滤
Ecol Evol. 2018 Jun 11;8(14):7002-7013. doi: 10.1002/ece3.4219. eCollection 2018 Jul.
9
Whole-Genome Restriction Mapping by "Subhaploid"-Based RAD Sequencing: An Efficient and Flexible Approach for Physical Mapping and Genome Scaffolding.基于“亚单倍体”的RAD测序进行全基因组限制性图谱绘制:一种用于物理图谱构建和基因组支架搭建的高效灵活方法。
Genetics. 2017 Jul;206(3):1237-1250. doi: 10.1534/genetics.117.200303. Epub 2017 May 3.
10
Serial sequencing of isolength RAD tags for cost-efficient genome-wide profiling of genetic and epigenetic variations.高通量 RAD 标签测序技术用于经济高效地进行全基因组遗传和表观遗传变异的分析
Nat Protoc. 2016 Nov;11(11):2189-2200. doi: 10.1038/nprot.2016.133. Epub 2016 Oct 6.
Nat Rev Genet. 2011 Jun 17;12(7):499-510. doi: 10.1038/nrg3012.
4
Genotype and SNP calling from next-generation sequencing data.从下一代测序数据中进行基因型和单核苷酸多态性(SNP)的调用。
Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.
5
Local de novo assembly of RAD paired-end contigs using short sequencing reads.使用短测序读长进行 RAD 配对末端 contigs 的本地从头组装。
PLoS One. 2011 Apr 13;6(4):e18561. doi: 10.1371/journal.pone.0018561.
6
RADSeq: next-generation population genetics.RADSeq:下一代群体遗传学。
Brief Funct Genomics. 2010 Dec;9(5-6):416-23. doi: 10.1093/bfgp/elq031.
7
Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags.基于测序 RAD 标签的三刺鱼平行适应的群体基因组学研究。
PLoS Genet. 2010 Feb 26;6(2):e1000862. doi: 10.1371/journal.pgen.1000862.
8
De novo assembly of human genomes with massively parallel short read sequencing.利用大规模平行短读测序进行人类基因组从头组装。
Genome Res. 2010 Feb;20(2):265-72. doi: 10.1101/gr.097261.109. Epub 2009 Dec 17.
9
SOAP2: an improved ultrafast tool for short read alignment.SOAP2:一种用于短读序列比对的改进型超快速工具。
Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.
10
The map-based sequence of the rice genome.水稻基因组的基于图谱的序列。
Nature. 2005 Aug 11;436(7052):793-800. doi: 10.1038/nature03895.