• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用全基因组从头组装进行单样本 SNP 和 INDEL 调用的探索。

Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly.

机构信息

Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, MA 02142, USA.

出版信息

Bioinformatics. 2012 Jul 15;28(14):1838-44. doi: 10.1093/bioinformatics/bts280. Epub 2012 May 7.

DOI:10.1093/bioinformatics/bts280
PMID:22569178
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3389770/
Abstract

MOTIVATION

Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs.

RESULTS

To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index.

AVAILABILITY

http://github.com/lh3/fermi

摘要

动机

尤金·迈尔斯(Eugene Myers)在他的字符串图论文中提出,在字符串图或等效的单元图中,任何路径都代表有效的组装。由于字符串/单元图还编码了所有读取的有效组装,因此只要可以正确构建该图,实际上它就是读取的无损表示。原则上,基于全基因组鸟枪法测序(WGS)数据的所有分析,例如 SNP 和插入/缺失(INDEL)调用,都可以使用单元来实现。

结果

为了探索从头组装在重测序背景下的可行性,我们开发了一个名为 fermi 的从头组装程序,该程序将 Illumina 短读取组装成单元,同时保留了输入读取的大部分信息。通过将单元映射到参考基因组,可以调用 SNP 和 INDEL。通过在 35 倍人类重测序数据上应用该方法,我们表明与标准流水线相比,我们的方法在 SNP 调用方面具有相似的准确性,在 INDEL 调用方面具有更好的结果。与其他基于从头组装的方法相比,它在变体调用方面具有更高的灵敏度。我们的工作表明,从头组装的变体调用可以成为全基因组重测序标准变体调用流水线的有益补充。在方法方面,我们提出了 FMD-index 用于 DNA 序列的前后扩展,一种快速算法用于查找所有超最大精确匹配,以及从 FMD-index 一次构建单元。

可用性

http://github.com/lh3/fermi

相似文献

1
Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly.利用全基因组从头组装进行单样本 SNP 和 INDEL 调用的探索。
Bioinformatics. 2012 Jul 15;28(14):1838-44. doi: 10.1093/bioinformatics/bts280. Epub 2012 May 7.
2
FermiKit: assembly-based variant calling for Illumina resequencing data.FermiKit:用于Illumina重测序数据的基于组装的变异检测
Bioinformatics. 2015 Nov 15;31(22):3694-6. doi: 10.1093/bioinformatics/btv440. Epub 2015 Jul 27.
3
FSG: Fast String Graph Construction for De Novo Assembly.FSG:用于从头组装的快速字符串图构建
J Comput Biol. 2017 Oct;24(10):953-968. doi: 10.1089/cmb.2017.0089. Epub 2017 Jul 17.
4
ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.ScanIndel:一种通过间隙比对、分割读段和从头组装进行插入缺失检测的混合框架。
Genome Med. 2015 Dec 7;7:127. doi: 10.1186/s13073-015-0251-2.
5
SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.SNVSniffer:一种用于种系和体细胞单核苷酸及插入缺失突变的综合检测工具。
BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.
6
mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel:一种用于全基因组插入缺失标记开发的高通量高效流程
BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.
7
Comparative analysis of de novo assemblers for variation discovery in personal genomes.从头组装程序在个人基因组变异发现中的比较分析。
Brief Bioinform. 2018 Sep 28;19(5):893-904. doi: 10.1093/bib/bbx037.
8
Read trimming has minimal effect on bacterial SNP-calling accuracy.reads 修剪对细菌 SNP 调用准确性的影响最小。
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.
9
The fragment assembly string graph.片段组装字符串图。
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii79-85. doi: 10.1093/bioinformatics/bti1114.
10
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.

引用本文的文献

1
Exploring genetic diversity and population structure of Myanmar indigenous chickens using double digest restriction site-associated DNA sequencing.利用双酶切限制性位点关联DNA测序技术探究缅甸本土鸡的遗传多样性和群体结构
Anim Genet. 2025 Aug;56(4):e70038. doi: 10.1111/age.70038.
2
Integrated metagenomics and transcriptomics analysis reveals pathways associated with oral periapical lesions formation and progression.综合宏基因组学和转录组学分析揭示了与口腔根尖周病变形成和进展相关的途径。
Curr Res Microb Sci. 2025 Jul 21;9:100443. doi: 10.1016/j.crmicr.2025.100443. eCollection 2025.
3
Comprehensive genomic and transcriptomic analyses reveal prognostic stratification for esophageal squamous cell carcinoma.全面的基因组和转录组分析揭示了食管鳞状细胞癌的预后分层。
Signal Transduct Target Ther. 2025 Jul 17;10(1):223. doi: 10.1038/s41392-025-02306-8.
4
Population genomics provides new insights into the genetic variation patterns, population demographic history, and high-altitude adaptation of Sophora moorcroftiana.群体基因组学为多花槐的遗传变异模式、群体人口统计学历史和高海拔适应性提供了新的见解。
BMC Plant Biol. 2025 Jul 11;25(1):899. doi: 10.1186/s12870-025-06885-0.
5
Assessing the suitability of formalin-fixed paraffin-embedded (FFPE) tissue for genome-wide association studies (GWAS).评估福尔马林固定石蜡包埋(FFPE)组织用于全基因组关联研究(GWAS)的适用性。
BMC Res Notes. 2025 Jul 1;18(1):254. doi: 10.1186/s13104-025-07306-z.
6
Lossless Pangenome Indexing Using Tag Arrays.使用标签数组的无损全基因组索引
bioRxiv. 2025 May 15:2025.05.12.653561. doi: 10.1101/2025.05.12.653561.
7
Development of a SNP Panel for Geographic Assignment and Population Monitoring of Jaguars ().用于美洲虎地理归属和种群监测的单核苷酸多态性(SNP)面板的开发()。
Ecol Evol. 2025 May 22;15(5):e71465. doi: 10.1002/ece3.71465. eCollection 2025 May.
8
A survey of sequence-to-graph mapping algorithms in the pangenome era.泛基因组时代序列到图谱映射算法综述。
Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.
9
Establishment of a set of St-group wheat- derivative lines conferring resistance to powdery mildew.建立一组对白粉病具有抗性的St组小麦衍生系。
Front Plant Sci. 2025 Apr 16;16:1576050. doi: 10.3389/fpls.2025.1576050. eCollection 2025.
10
SARS-CoV-2 neutralizing antibody specificities differ dramatically between recently infected infants and immune-imprinted individuals.严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)中和抗体的特异性在近期感染的婴儿和免疫印记个体之间存在显著差异。
J Virol. 2025 Apr 15;99(4):e0010925. doi: 10.1128/jvi.00109-25. Epub 2025 Mar 25.

本文引用的文献

1
De novo assembly and genotyping of variants using colored de Bruijn graphs.利用有色 de Bruijn 图进行从头组装和变体基因分型。
Nat Genet. 2012 Jan 8;44(2):226-32. doi: 10.1038/ng.1028.
2
Performance comparison of whole-genome sequencing platforms.全基因组测序平台的性能比较。
Nat Biotechnol. 2011 Dec 18;30(1):78-82. doi: 10.1038/nbt.2065.
3
Computational techniques for human genome resequencing using mated gapped reads.使用配对缺口读段进行人类基因组重测序的计算技术。
J Comput Biol. 2012 Mar;19(3):279-92. doi: 10.1089/cmb.2011.0201. Epub 2011 Dec 16.
4
Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。
Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.
5
A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.
6
Natural genetic variation caused by small insertions and deletions in the human genome.人类基因组中小的插入和缺失引起的自然遗传变异。
Genome Res. 2011 Jun;21(6):830-9. doi: 10.1101/gr.115907.110. Epub 2011 Apr 1.
7
Improving SNP discovery by base alignment quality.通过碱基比对质量提高 SNP 发现。
Bioinformatics. 2011 Apr 15;27(8):1157-8. doi: 10.1093/bioinformatics/btr076. Epub 2011 Feb 13.
8
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.利用大规模平行测序数据生成高质量的哺乳动物基因组草图组装。
Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27.
9
HiTEC: accurate error correction in high-throughput sequencing data.HiTEC:高通量测序数据中的精确错误校正。
Bioinformatics. 2011 Feb 1;27(3):295-302. doi: 10.1093/bioinformatics/btq653. Epub 2010 Nov 26.
10
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.