• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于分析工具的Phred-Phrap软件包:一种促进群体遗传学重测序研究的流程。

Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies.

作者信息

Machado Moara, Magalhães Wagner Cs, Sene Allan, Araújo Bruno, Faria-Campos Alessandra C, Chanock Stephen J, Scott Leandro, Oliveira Guilherme, Tarazona-Santos Eduardo, Rodrigues Maira R

机构信息

Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Av Antonio Carlos 6627, Pampulha, Caixa Postal 486, Belo Horizonte, MG, CEP 31270-910, Brazil.

Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Av Antonio Carlos 6627, Pampulha, Belo Horizonte, MG, CEP 31270-910, Brazil.

出版信息

Investig Genet. 2011 Feb 1;2(1):3. doi: 10.1186/2041-2223-2-3.

DOI:10.1186/2041-2223-2-3
PMID:21284835
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3041995/
Abstract

BACKGROUND

Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data.

RESULTS

In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp.

CONCLUSION

We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.

摘要

背景

靶向重测序是群体遗传学研究中最强大且应用最广泛的策略之一,因为它能够对变异进行无偏筛选,适用于多种生物。需要重测序数据的研究实例包括进化推断、旨在捕获负责复杂性状的罕见多态性的流行病学研究,以及对特定遗传疾病高发的家庭和小群体中的突变进行筛查。尽管新一代测序技术已经出现,但由于基于毛细管电泳的自动测序仪广泛可用,且在群体遗传学研究中仍然不太容易出现测序错误,桑格测序仍然是群体遗传学研究中最受欢迎的方法。两种用于重测序研究的流行软件应用程序是Phred-Phrap-Consed-Polyphred,它执行碱基识别、比对、图形编辑和基因型识别,以及DNAsp,它执行一组群体遗传学分析。这些独立的工具是基础分析的起点和终点。在使用这些工具的过程中,对于重测序数据有一组基本但容易出错的任务需要执行。

结果

为了协助完成这些中间任务,我们开发了一个流程,便于处理重测序研究中典型的数据。我们的流程:(1)整合由共享参考序列的不同Phred-Phrap-Consed重叠群产生的不同输出;(2)检查基因分型的不一致性;(3)将Polyphred产生的基因分型数据重新格式化为一个基因型矩阵,其中个体为行,分离位点为列;(4)使用流行软件PHASE准备用于单倍型推断的输入文件;(5)处理仅包含多态性位点的PHASE输出文件,以根据群体遗传学软件(如DNAsp)对重测序数据的要求,重建包括多态性和单态性位点的推断单倍型。

结论

我们在人类、植物、动物和微生物的单倍体和二倍体数据的重测序研究中测试了该流程,观察到它显著减少了测序分析所需的时间,并且是一个更可控的过程,消除了处理数据集时可能出现的几类错误。该流程对于使用其他测序和群体遗传学分析工具的研究人员也很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/5dbb6f868ab2/2041-2223-2-3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/ffe421e78c6f/2041-2223-2-3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/d15b625801d6/2041-2223-2-3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/5dbb6f868ab2/2041-2223-2-3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/ffe421e78c6f/2041-2223-2-3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/d15b625801d6/2041-2223-2-3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6585/3041995/5dbb6f868ab2/2041-2223-2-3-3.jpg

相似文献

1
Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies.用于分析工具的Phred-Phrap软件包:一种促进群体遗传学重测序研究的流程。
Investig Genet. 2011 Feb 1;2(1):3. doi: 10.1186/2041-2223-2-3.
2
PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing.PolyPhred:利用基于荧光的重测序技术自动检测单核苷酸替换并进行基因分型。
Nucleic Acids Res. 1997 Jul 15;25(14):2745-51. doi: 10.1093/nar/25.14.2745.
3
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
4
Assembling genomic DNA sequences with PHRAP.使用PHRAP组装基因组DNA序列。
Curr Protoc Bioinformatics. 2007 Mar;Chapter 11:Unit11.4. doi: 10.1002/0471250953.bi1104s17.
5
Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome.利用基于质量的荧光重测序技术自动化识别DNA变异:人类线粒体基因组分析
Nucleic Acids Res. 1998 Feb 15;26(4):967-73. doi: 10.1093/nar/26.4.967.
6
Consed: a graphical tool for sequence finishing.Consed:一种用于序列完成的图形工具。
Genome Res. 1998 Mar;8(3):195-202. doi: 10.1101/gr.8.3.195.
7
Consed: a graphical editor for next-generation sequencing.Consed:下一代测序的图形编辑器。
Bioinformatics. 2013 Nov 15;29(22):2936-7. doi: 10.1093/bioinformatics/btt515. Epub 2013 Aug 31.
8
PolyPhred analysis software for mutation detection from fluorescence-based sequence data.用于从基于荧光的序列数据中检测突变的PolyPhred分析软件。
Curr Protoc Hum Genet. 2008 Oct;Chapter 7:Unit 7.16. doi: 10.1002/0471142905.hg0716s59.
9
mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel:一种用于全基因组插入缺失标记开发的高通量高效流程
BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.
10
Characterizing bias in population genetic inferences from low-coverage sequencing data.从低覆盖测序数据中推断群体遗传时的偏差特征分析。
Mol Biol Evol. 2014 Mar;31(3):723-35. doi: 10.1093/molbev/mst229. Epub 2013 Nov 27.

引用本文的文献

1
Direct Evidence of Microbial Sunscreen Production by Scum-Forming Cyanobacteria in the Baltic Sea.波罗的海形成浮沫的蓝藻细菌产生微生物防晒霜的直接证据。
Environ Microbiol Rep. 2025 Feb;17(1):e70056. doi: 10.1111/1758-2229.70056.
2
Integrated identification of growth pattern and taxon of bacterium in gut microbiota via confocal fluorescence imaging-oriented single-cell sequencing.通过共聚焦荧光成像导向的单细胞测序对肠道微生物群中细菌的生长模式和分类群进行综合鉴定。
mLife. 2022 Sep 26;1(3):350-358. doi: 10.1002/mlf2.12041. eCollection 2022 Sep.
3
Trypanosoma cruzi iron superoxide dismutases: insights from phylogenetics to chemotherapeutic target assessment.

本文引用的文献

1
Evolution of detoxifying systems: the role of environment and population history in shaping genetic diversity at human CYP2D6 locus.解毒系统的进化:环境和人口历史在塑造人类 CYP2D6 基因座遗传多样性中的作用。
Pharmacogenet Genomics. 2010 Aug;20(8):485-99. doi: 10.1097/FPC.0b013e32833bba25.
2
Diversity in the glucose transporter-4 gene (SLC2A4) in humans reflects the action of natural selection along the old-world primates evolution.人类葡萄糖转运蛋白 4 基因(SLC2A4)的多样性反映了在旧世界灵长类动物进化过程中自然选择的作用。
PLoS One. 2010 Mar 23;5(3):e9827. doi: 10.1371/journal.pone.0009827.
3
Phylogeography of Plathymenia reticulata (Leguminosae) reveals patterns of recent range expansion towards northeastern Brazil and southern Cerrados in Eastern Tropical South America.
克氏锥虫铁超氧化物歧化酶:从系统发生学到化学治疗靶标评估的见解。
Parasit Vectors. 2022 Jun 6;15(1):194. doi: 10.1186/s13071-022-05319-2.
4
Human papillomavirus type 13: Genome amplification and characterization data.
Data Brief. 2021 Mar 15;35:106955. doi: 10.1016/j.dib.2021.106955. eCollection 2021 Apr.
5
Novel Fig-Associated Viroid-Like RNAs Containing Hammerhead Ribozymes in Both Polarity Strands Identified by High-Throughput Sequencing.通过高通量测序鉴定出的新型与无花果相关的类病毒样RNA,其正负链均含有锤头状核酶
Front Microbiol. 2020 Aug 18;11:1903. doi: 10.3389/fmicb.2020.01903. eCollection 2020.
6
Development of sequence-based markers for seed protein content in pigeonpea.基于序列的鸽豆种子蛋白含量标记的开发。
Mol Genet Genomics. 2019 Feb;294(1):57-68. doi: 10.1007/s00438-018-1484-8. Epub 2018 Sep 1.
7
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.二倍体棉花物种间叶绿体DNA的结构变异、系统发育及分化时间
PLoS One. 2016 Jun 16;11(6):e0157183. doi: 10.1371/journal.pone.0157183. eCollection 2016.
8
"Every Gene Is Everywhere but the Environment Selects": Global Geolocalization of Gene Sharing in Environmental Samples through Network Analysis.“每个基因都无处不在,但环境在选择”:通过网络分析对环境样本中的基因共享进行全球地理定位。
Genome Biol Evol. 2016 May 13;8(5):1388-400. doi: 10.1093/gbe/evw077.
9
Genetic diversity, linkage disequilibrium and power of a large grapevine (Vitis vinifera L) diversity panel newly designed for association studies.为关联研究新设计的大型葡萄(欧亚种葡萄)多样性群体的遗传多样性、连锁不平衡及效能
BMC Plant Biol. 2016 Mar 22;16:74. doi: 10.1186/s12870-016-0754-z.
10
De Novo Transcriptome Assembly and Comparative Analysis Elucidate Complicated Mechanism Regulating Astragalus chrysochlorus Response to Selenium Stimuli.从头转录组组装与比较分析揭示调控金叶黄芪对硒刺激响应的复杂机制
PLoS One. 2015 Oct 2;10(10):e0135677. doi: 10.1371/journal.pone.0135677. eCollection 2015.
Plathymenia reticulata(豆科)的系统地理学揭示了近期向东南热带南美洲的巴西东北部和南塞拉多扩张的模式。
Mol Ecol. 2010 Mar;19(5):985-98. doi: 10.1111/j.1365-294X.2010.04530.x. Epub 2010 Feb 8.
4
A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33.一项全基因组关联研究确定了染色体 13q22.1、1q32.1 和 5p15.33 上的胰腺癌易感性位点。
Nat Genet. 2010 Mar;42(3):224-8. doi: 10.1038/ng.522. Epub 2010 Jan 24.
5
Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data.从多维 SNP 频率数据推断多个群体的联合人口历史。
PLoS Genet. 2009 Oct;5(10):e1000695. doi: 10.1371/journal.pgen.1000695. Epub 2009 Oct 23.
6
A comprehensive resequence analysis of the KLK15-KLK3-KLK2 locus on chromosome 19q13.33.对 19q13.33 染色体上 KLK15-KLK3-KLK2 基因座进行全面重测序分析。
Hum Genet. 2010 Jan;127(1):91-9. doi: 10.1007/s00439-009-0751-5. Epub 2009 Oct 13.
7
Cancer genome sequencing: a review.癌症基因组测序:综述
Hum Mol Genet. 2009 Oct 15;18(R2):R163-8. doi: 10.1093/hmg/ddp396.
8
FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis.FORMATOMATIC:一个用于在群体遗传学分析的常见格式之间转换二倍体等位基因数据的程序。
Mol Ecol Notes. 2007 Jul 1;7(4):592-593. doi: 10.1111/j.1471-8286.2007.01784.x.
9
Targets of balancing selection in the human genome.人类基因组中平衡选择的靶标。
Mol Biol Evol. 2009 Dec;26(12):2755-64. doi: 10.1093/molbev/msp190. Epub 2009 Aug 27.
10
Texas population substructure and its impact on estimating the rarity of Y STR haplotypes from DNA evidence*.德克萨斯州人口亚结构及其对从DNA证据估计Y染色体短串联重复序列单倍型稀有性的影响*
J Forensic Sci. 2009 Sep;54(5):1016-21. doi: 10.1111/j.1556-4029.2009.01105.x. Epub 2009 Jul 15.