SNP-噬菌体——高通量单核苷酸多态性发现流程

SNP-PHAGE--High throughput SNP discovery pipeline.

作者信息

Matukumalli Lakshmi K, Grefenstette John J, Hyten David L, Choi Ik-Young, Cregan Perry B, Van Tassell Curtis P

机构信息

US Department of Agriculture, ARS, Beltsville Agricultural Research Center, Bovine Functional Genomics Laboratory, Beltsville, MD 20705, USA.

出版信息

BMC Bioinformatics. 2006 Oct 23;7:468. doi: 10.1186/1471-2105-7-468.

DOI:10.1186/1471-2105-7-468

PMID:17059604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1626092/

Abstract

BACKGROUND

Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable.

RESULTS

We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at http://bfgl.anri.barc.usda.gov/ML/snp-phage/.

CONCLUSION

SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.

摘要

背景

本文所定义的单核苷酸多态性（SNP）是给定物种个体之间或个体内部的单碱基序列变化或短插入/缺失。由于SNP数量丰富且高通量分析技术可用，SNP标记已开始取代其他传统标记，如限制性片段长度多态性（RFLP）、扩增片段长度多态性（AFLP）和简单序列重复（SSR或微卫星）标记，用于多个物种的精细定位和关联研究。为了从色谱图数据中发现SNP，必须组合几个生物信息学程序来生成分析流程。结果必须存储在关系数据库中，以便通过查询进行查询或生成用于进一步分析的数据，如连锁不平衡的测定和常见单倍型的鉴定。尽管这些任务由多个团队常规执行，但目前尚无一个集成的开源SNP发现流程，可供对SNP标记开发感兴趣的新团队轻松采用。

结果

我们开发了SNP - PHAGE（具有用于在序列标签位点内鉴定常见单倍型（单倍型分析）和向GenBank（-dbSNP）提交数据的附加功能的SNP发现流程）。该工具用于分析来自不同大豆基因型的序列痕迹，以发现超过10,000个SNP。此软件包是在UNIX/Linux平台上开发的，用Perl编写，并使用MySQL数据库。还提供了用于生成用户友好型网页界面的脚本以及用于初步数据分析的常见查询。该团队开发的用于提高SNP发现效率的机器学习工具作为此软件包的一个可选功能集成其中。SNP - PHAGE软件包可在http://bfgl.anri.barc.usda.gov/ML/snp - phage/上以开源形式获取。

结论

SNP - PHAGE为高通量SNP发现、扩增子内常见单倍型的鉴定以及向GenBank（dbSNP）提交数据提供了一种生物信息学解决方案。通过用户友好型网页界面辅助进行SNP选择和可视化。该工具对于分析基因组序列的序列标签位点（STS）很有用，并且此软件可作为对开发SNP标记感兴趣的团队的起点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5081/1626092/8e9d196e3d0a/1471-2105-7-468-1.jpg

相似文献

SNP-PHAGE--High throughput SNP discovery pipeline.SNP-噬菌体——高通量单核苷酸多态性发现流程

BMC Bioinformatics. 2006 Oct 23;7:468. doi: 10.1186/1471-2105-7-468.

SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation.利用MAVIANT挖掘猪EST中的单核苷酸多态性，MAVIANT是一种用于单核苷酸多态性评估和注释的新型工具。

Bioinformatics. 2007 Jul 1;23(13):i387-91. doi: 10.1093/bioinformatics/btm192.

SNP-VISTA: an interactive SNP visualization tool.SNP-VISTA：一种交互式单核苷酸多态性可视化工具。

BMC Bioinformatics. 2005 Dec 8;6:292. doi: 10.1186/1471-2105-6-292.

MSQT for choosing SNP assays from multiple DNA alignments.用于从多个DNA比对中选择单核苷酸多态性（SNP）检测方法的多序列快速查询工具（MSQT）

Bioinformatics. 2007 Oct 15;23(20):2784-7. doi: 10.1093/bioinformatics/btm428. Epub 2007 Sep 4.

High-throughput identification, database storage and analysis of SNPs in EST sequences.EST序列中SNP的高通量鉴定、数据库存储及分析

Genome Inform. 2001;12:194-203.

A new method for SNP discovery.一种发现单核苷酸多态性的新方法。

Biotechniques. 2009 Mar;46(3):201-8. doi: 10.2144/000113075.

LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.LS-SNP：基于多信息源的编码非同义单核苷酸多态性的大规模注释

Bioinformatics. 2005 Jun 15;21(12):2814-20. doi: 10.1093/bioinformatics/bti442. Epub 2005 Apr 12.

SEAN: SNP prediction and display program utilizing EST sequence clusters.肖恩：利用表达序列标签（EST）序列簇的单核苷酸多态性（SNP）预测与显示程序。

Bioinformatics. 2006 Feb 15;22(4):495-6. doi: 10.1093/bioinformatics/btk006. Epub 2005 Dec 15.

Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows.通过在滑动窗口上进行快速最近邻搜索来推断大型单核苷酸多态性（SNP）面板中缺失的基因型。

Bioinformatics. 2007 Jul 1;23(13):i401-7. doi: 10.1093/bioinformatics/btm220.

Mining for SNPs and SSRs using SNPServer, dbSNP and SSR taxonomy tree.使用SNPServer、dbSNP和SSR分类树挖掘单核苷酸多态性（SNPs）和简单序列重复（SSRs）。

Methods Mol Biol. 2009;537:303-21. doi: 10.1007/978-1-59745-251-9_15.

引用本文的文献

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences.AlignMiner：一种基于网络的工具，用于检测保守序列多序列比对中的差异区域。

Algorithms Mol Biol. 2010 Jun 2;5:24. doi: 10.1186/1748-7188-5-24.

Genome-wide discovery of DNA polymorphism in Brassica rapa.甘蓝型油菜基因组范围内的 DNA 多态性发现。

Mol Genet Genomics. 2010 Feb;283(2):135-45. doi: 10.1007/s00438-009-0504-0. Epub 2009 Dec 19.

High-resolution haplotype block structure in the cattle genome.牛基因组中的高分辨率单倍型块结构。

BMC Genet. 2009 Apr 24;10:19. doi: 10.1186/1471-2156-10-19.

High-throughput genotyping with the GoldenGate assay in the complex genome of soybean.利用GoldenGate分析技术对大豆复杂基因组进行高通量基因分型。

Theor Appl Genet. 2008 May;116(7):945-52. doi: 10.1007/s00122-008-0726-2. Epub 2008 Feb 16.

An integrated pipeline of open source software adapted for multi-CPU architectures: use in the large-scale identification of single nucleotide polymorphisms.一种适用于多CPU架构的开源软件集成流程：用于单核苷酸多态性的大规模鉴定。

Comp Funct Genomics. 2007;2007:35604. doi: 10.1155/2007/35604.

A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis.一张大豆转录图谱：基因分布、单倍型及单核苷酸多态性分析

Genetics. 2007 May;176(1):685-96. doi: 10.1534/genetics.107.070821. Epub 2007 Mar 4.

Highly variable patterns of linkage disequilibrium in multiple soybean populations.多个大豆群体中连锁不平衡的高度可变模式。

Genetics. 2007 Apr;175(4):1937-44. doi: 10.1534/genetics.106.069740. Epub 2007 Feb 7.

本文引用的文献

Automating sequence-based detection and genotyping of SNPs from diploid samples.自动化从二倍体样本中基于序列的单核苷酸多态性（SNP）检测和基因分型。

Nat Genet. 2006 Mar;38(3):375-81. doi: 10.1038/ng1746. Epub 2006 Feb 19.

Application of machine learning in SNP discovery.机器学习在单核苷酸多态性发现中的应用。

BMC Bioinformatics. 2006 Jan 6;7:4. doi: 10.1186/1471-2105-7-4.

SNPdetector: a software tool for sensitive and accurate SNP detection.SNPdetector：一款用于灵敏且准确地检测单核苷酸多态性的软件工具。

PLoS Comput Biol. 2005 Oct;1(5):e53. doi: 10.1371/journal.pcbi.0010053. Epub 2005 Oct 28.

InSNP: a tool for automated detection and visualization of SNPs and InDels.InSNP：一种用于自动检测和可视化单核苷酸多态性（SNP）和插入缺失（InDel）的工具。

Hum Mutat. 2005 Jul;26(1):11-9. doi: 10.1002/humu.20188.

novoSNP, a novel computational tool for sequence variation discovery.novoSNP，一种用于发现序列变异的新型计算工具。

Genome Res. 2005 Mar;15(3):436-42. doi: 10.1101/gr.2754005.

The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs.利用8525个常染色体单核苷酸多态性（SNP）对四个群体中群体亚结构的基因组分布情况进行研究。

Hum Genomics. 2004 May;1(4):274-86. doi: 10.1186/1479-7364-1-4-274.

POSA: perl objects for DNA sequencing data analysis.POSA：用于DNA测序数据分析的Perl对象。

BMC Genomics. 2004 Aug 27;5(1):60. doi: 10.1186/1471-2164-5-60.

Haploview: analysis and visualization of LD and haplotype maps.Haploview：连锁不平衡（LD）和单倍型图谱的分析与可视化

Bioinformatics. 2005 Jan 15;21(2):263-5. doi: 10.1093/bioinformatics/bth457. Epub 2004 Aug 5.

Automated SNP detection in expressed sequence tags: statistical considerations and application to maritime pine sequences.表达序列标签中的自动化单核苷酸多态性检测：统计学考量及在海岸松序列中的应用

Plant Mol Biol. 2004 Feb;54(3):461-70. doi: 10.1023/B:PLAN.0000036376.11710.6f.

Single-nucleotide polymorphisms in soybean.大豆中的单核苷酸多态性

Genetics. 2003 Mar;163(3):1123-34. doi: 10.1093/genetics/163.3.1123.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SNP-噬菌体——高通量单核苷酸多态性发现流程

SNP-PHAGE--High throughput SNP discovery pipeline.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献