• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于从单分子测序数据中进行小卫星等位基因本地组装的流程。

A pipeline for local assembly of minisatellite alleles from single-molecule sequencing data.

作者信息

Ogeh Denye, Badge Richard

出版信息

Bioinformatics. 2017 Mar 1;33(5):650-653. doi: 10.1093/bioinformatics/btw687.

DOI:10.1093/bioinformatics/btw687
PMID:27998939
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5408865/
Abstract

MOTIVATION

The advent of Next Generation Sequencing (NGS) has led to the generation of enormous volumes of short read sequence data, cheaply and in reasonable time scales. Nevertheless, the quality of genome assemblies generated using NGS technologies has been greatly affected, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data to scaffold repetitive structures, creating gaps, inversions and rearrangements and resulting in assemblies that are, at best, draft forms. Third generation single-molecule sequencing (SMS) technologies (e.g. Pacific Biosciences Single Molecule Real Time (SMRT) system) address this challenge by generating sequences with increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality.

RESULTS

Here, we evaluate the ability of SMS data (specifically human genome Pacific Biosciences SMRT data) to recover poorly represented repetitive sequences (specifically, GC-rich human minisatellites). To do this we designed a pipeline for the collection, processing and local assembly of single-molecule sequence data to form accurate contiguous local reconstructions. Our results show the recovery of an allele of the non-coding minisatellite MS1 (located on chromosome 1 at 1p33-35) at greater than 97% identity to reference (GRCh38) from the unprocessed sequence data of a haploid complete hydatidiform mole (CHM1) cell line. Furthermore, our assembly revealed an allele of over 500 repeat units; much larger than the reference (GRCh38), but consistent in structure with naturally occurring alleles that are segregating in human populations. This local assembly's reconstruction was validated with the release of the whole genome assemblies GCA_001297185.1 and GCA_000772585.3, where this allele occurs. Additionally, application of this pipeline to coding minisatellites in the PRDM9 and ZNF93 genes enabled recovery of high identity allele structures for these sequence regions whose length was confirmed by PCR from cell line genomic DNA. The internal repeat structure of the PRDM9 allele recovered was consistent with common human-specific alleles.

AVAILABILITY AND IMPLEMENTATION

Code available at https://github.com/ndliberial/smrt_pipeline.

CONTACT

dno2@le.ac.uk.

摘要

动机

新一代测序(NGS)技术的出现,使得在合理的时间范围内能够低成本地生成大量短读长序列数据。然而,与使用桑格DNA测序技术生成的基因组组装结果相比,使用NGS技术生成的基因组组装质量受到了很大影响。这主要是因为短读长序列数据无法对重复结构进行支架搭建,从而产生缺口、倒位和重排,导致组装结果充其量只是草图形式。第三代单分子测序(SMS)技术(如太平洋生物科学公司的单分子实时(SMRT)系统)通过生成更长读长的序列来应对这一挑战,有望更好地恢复这些复杂的重复结构,进而提高组装质量。

结果

在此,我们评估了SMS数据(具体为人类基因组的太平洋生物科学公司SMRT数据)恢复代表性不足的重复序列(具体为富含GC的人类小卫星序列)的能力。为此,我们设计了一个流程,用于收集、处理和局部组装单分子序列数据,以形成准确的连续局部重建。我们的结果表明,从单倍体完全性葡萄胎(CHM1)细胞系的未处理序列数据中,非编码小卫星MS1(位于1号染色体1p33 - 35处)的一个等位基因以大于97%的一致性恢复到参考序列(GRCh38)。此外,我们的组装揭示了一个超过500个重复单元的等位基因;比参考序列(GRCh38)大得多,但在结构上与在人类群体中分离的自然存在的等位基因一致。在等位基因出现的全基因组组装GCA_001297185.1和GCA_000772585.3发布后,对该局部组装的重建进行了验证。此外,将此流程应用于PRDM9和ZNF93基因中的编码小卫星,能够恢复这些序列区域的高一致性等位基因结构,其长度通过从细胞系基因组DNA进行PCR得以确认。恢复的PRDM9等位基因的内部重复结构与常见的人类特异性等位基因一致。

可用性与实现方式

代码可在https://github.com/ndliberial/smrt_pipeline获取。

联系方式

dno2@le.ac.uk。

相似文献

1
A pipeline for local assembly of minisatellite alleles from single-molecule sequencing data.一种用于从单分子测序数据中进行小卫星等位基因本地组装的流程。
Bioinformatics. 2017 Mar 1;33(5):650-653. doi: 10.1093/bioinformatics/btw687.
2
SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.甜菜(Beta vulgaris)叶绿体基因组的单分子实时测序从头组装
BMC Bioinformatics. 2015 Sep 16;16(1):295. doi: 10.1186/s12859-015-0726-6.
3
HISEA: HIerarchical SEed Aligner for PacBio data.HISEA:用于PacBio数据的分层种子比对器。
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.
4
Targeted single molecule sequencing methodology for ovarian hyperstimulation syndrome.卵巢过度刺激综合征的靶向单分子测序方法
BMC Genomics. 2015 Apr 3;16(1):264. doi: 10.1186/s12864-015-1451-2.
5
Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.半人马座α星:利用长读长测序评估新型着丝粒重复序列变异
Bioinformatics. 2016 Jul 1;32(13):1921-1924. doi: 10.1093/bioinformatics/btw101. Epub 2016 Feb 24.
6
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
7
Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read Sequencing on the ION PGM System and Long-Reads Generated by Single Molecule, Real-Time Sequencing on the PacBio Platform.基于 ION PGM 系统的短读长测序和 PacBio 平台的单分子实时测序生成的长读长对全长 HLA I 类和 II 类基因中多态性的参考级特征描述。
Front Immunol. 2018 Oct 4;9:2294. doi: 10.3389/fimmu.2018.02294. eCollection 2018.
8
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.利用单分子测序和局部敏感哈希组装大型基因组。
Nat Biotechnol. 2015 Jun;33(6):623-30. doi: 10.1038/nbt.3238. Epub 2015 May 25.
9
lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.lordFAST:用于长噪声测序数据的敏感快速比对搜索工具。
Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544.
10
Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.清除单倍型:三代二倍体基因组组装的等位基因 contig 重新分配。
BMC Bioinformatics. 2018 Nov 29;19(1):460. doi: 10.1186/s12859-018-2485-7.

引用本文的文献

1
Evolution of the recombination regulator PRDM9 in minke whales.小须鲸重组调控因子 PRDM9 的进化。
BMC Genomics. 2022 Mar 16;23(1):212. doi: 10.1186/s12864-022-08305-1.
2
Locus specific engineering of tandem DNA repeats in the genome of Saccharomyces cerevisiae using CRISPR/Cas9 and overlapping oligonucleotides.利用 CRISPR/Cas9 和重叠寡核苷酸对酿酒酵母基因组中的串联 DNA 重复序列进行基因座特异性工程改造。
Sci Rep. 2018 May 8;8(1):7127. doi: 10.1038/s41598-018-25508-3.

本文引用的文献

1
Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.Illumina TruSeq合成长读段技术助力从头组装,并解析复杂的、高度重复的转座元件。
PLoS One. 2014 Sep 4;9(9):e106689. doi: 10.1371/journal.pone.0106689. eCollection 2014.
2
Reconstructing complex regions of genomes using long-read sequencing technology.使用长读长测序技术重建基因组的复杂区域。
Genome Res. 2014 Apr;24(4):688-96. doi: 10.1101/gr.168450.113. Epub 2014 Jan 13.
3
Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.
注意差距:使用 Pacific Biosciences RS 长读测序技术升级基因组。
PLoS One. 2012;7(11):e47768. doi: 10.1371/journal.pone.0047768. Epub 2012 Nov 21.
4
Next-generation sequencing and large genome assemblies.下一代测序和大型基因组组装。
Pharmacogenomics. 2012 Jun;13(8):901-15. doi: 10.2217/pgs.12.72.
5
GAGE: A critical evaluation of genome assemblies and assembly algorithms.盖奇:基因组组装和算法的关键评估。
Genome Res. 2012 Mar;22(3):557-67. doi: 10.1101/gr.131383.111. Epub 2012 Jan 6.
6
Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals.通过对 18 名韩国个体的大规模平行 DNA 和 RNA 测序,鉴定出广泛的基因组和转录组多样性。
Nat Genet. 2011 Jul 3;43(8):745-52. doi: 10.1038/ng.872.
7
Integrative genomics viewer.整合基因组浏览器。
Nat Biotechnol. 2011 Jan;29(1):24-6. doi: 10.1038/nbt.1754.
8
Adaptive seeds tame genomic sequence comparison.自适应种子驯服基因组序列比较。
Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5.
9
Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.家鸡(Meleagris gallopavo)多平台新一代测序:基因组组装与分析。
PLoS Biol. 2010 Sep 7;8(9):e1000475. doi: 10.1371/journal.pbio.1000475.
10
PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans.PRDM9 变异强烈影响人类重组热点活性和减数分裂不稳定性。
Nat Genet. 2010 Oct;42(10):859-63. doi: 10.1038/ng.658. Epub 2010 Sep 5.