• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

读取云图揭示了人类基因组复杂区域的变异。

Read clouds uncover variation in complex regions of the human genome.

作者信息

Bishara Alex, Liu Yuling, Weng Ziming, Kashef-Haghighi Dorna, Newburger Daniel E, West Robert, Sidow Arend, Batzoglou Serafim

机构信息

Department of Computer Science, Stanford University, Stanford, California 94305, USA;

Department of Computer Science, Stanford University, Stanford, California 94305, USA; Department of Chemistry, Stanford University, Stanford, California 94305, USA;

出版信息

Genome Res. 2015 Oct;25(10):1570-80. doi: 10.1101/gr.191189.115. Epub 2015 Aug 18.

DOI:10.1101/gr.191189.115
PMID:26286554
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4579342/
Abstract

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.

摘要

尽管越来越多的人类遗传变异被识别和记录下来,但确定人类基因组重复序列中的变异仍然是一项挑战。因此,大多数群体和全基因组关联研究都无法考虑这些区域的变异。问题的核心在于缺乏一种测序技术,能够产生足够长度和准确性的读段以实现唯一比对。在此,我们提出一种新方法,即通过对源自长片段文库的DNA进行精确短读长测序获得读云,从而可靠地将短读段比对到重复区域内并实现准确的变异发现。我们的新算法——随机场比对器(RFA),通过马尔可夫随机场捕捉长读段过程所支配的短读段之间的关系。我们使用了Illumina TruSeq合成长读段方案的一个修改版本,该方案产生了浅测序读云。我们通过广泛的模拟测试了RFA,并将其应用于在NA12878人类样本上发现变异,该样本有可用的浅TruSeq读云测序数据,以及应用于我们使用相同方法测序的侵袭性乳腺癌基因组。我们证明,RFA有助于准确恢复人类基因组155 Mb中的变异,包括目前短读段技术无法检测到的67 Mb节段重复序列中的94%以及11 Mb转录序列中的96%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/6fa4acc4cdfe/1570f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/f1bc15187c98/1570f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/b175cd358abf/1570f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/7597972f23d6/1570f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/0debb4f0ede2/1570f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/6fa4acc4cdfe/1570f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/f1bc15187c98/1570f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/b175cd358abf/1570f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/7597972f23d6/1570f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/0debb4f0ede2/1570f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f44f/4579342/6fa4acc4cdfe/1570f05.jpg

相似文献

1
Read clouds uncover variation in complex regions of the human genome.读取云图揭示了人类基因组复杂区域的变异。
Genome Res. 2015 Oct;25(10):1570-80. doi: 10.1101/gr.191189.115. Epub 2015 Aug 18.
2
Genome-wide reconstruction of complex structural variants using read clouds.利用读取云进行复杂结构变异的全基因组重建。
Nat Methods. 2017 Sep;14(9):915-920. doi: 10.1038/nmeth.4366. Epub 2017 Jul 17.
3
Fast read alignment with incorporation of known genomic variants.快速读取与已知基因组变异的整合。
BMC Med Inform Decis Mak. 2019 Dec 19;19(Suppl 6):265. doi: 10.1186/s12911-019-0960-3.
4
Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.利用直系同源序列变异进行敏感比对可提高大片段重复区域的长读长序列比对和变异calling 效率。
Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.
5
Meta-aligner: long-read alignment based on genome statistics.Meta比对器:基于基因组统计信息的长读段比对。
BMC Bioinformatics. 2017 Feb 23;18(1):126. doi: 10.1186/s12859-017-1518-y.
6
Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA.利用 SRMA 对短读长下一代测序数据进行局部重比对以提高变异发现。
Genome Biol. 2010;11(10):R99. doi: 10.1186/gb-2010-11-10-r99. Epub 2010 Oct 8.
7
Structural variation analysis with strobe reads.使用 strobe reads 进行结构变异分析。
Bioinformatics. 2010 May 15;26(10):1291-8. doi: 10.1093/bioinformatics/btq153. Epub 2010 Apr 8.
8
Short read alignment with populations of genomes.短读序列比对与基因组群体。
Bioinformatics. 2013 Jul 1;29(13):i361-70. doi: 10.1093/bioinformatics/btt215.
9
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.单轮循环器:从短读长和长读长测序数据中解析细菌基因组组装结果
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.
10
HISEA: HIerarchical SEed Aligner for PacBio data.HISEA:用于PacBio数据的分层种子比对器。
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

引用本文的文献

1
Genome Sequence of a Marine Threespine Stickleback (Gasterosteus aculeatus) from Rabbit Slough in the Cook Inlet.来自库克湾兔子泥沼的一条海洋三刺鱼(Gasterosteus aculeatus)的基因组序列。
G3 (Bethesda). 2025 May 23. doi: 10.1093/g3journal/jkaf114.
2
Genome Sequence of a Marine Threespine Stickleback () from Rabbit Slough in the Cook Inlet.来自库克湾兔子浅滩的一条海洋三刺鱼()的基因组序列。
bioRxiv. 2025 Feb 8:2025.02.06.636934. doi: 10.1101/2025.02.06.636934.
3
MetaTrass: A high-quality metagenome assembler of the human gut microbiome by cobarcoding sequencing reads.

本文引用的文献

1
Accurate, multi-kb reads resolve complex populations and detect rare microorganisms.精确的多千碱基读取可解析复杂菌群并检测罕见微生物。
Genome Res. 2015 Apr;25(4):534-43. doi: 10.1101/gr.183012.114. Epub 2015 Feb 9.
2
MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.MinION 纳米孔测序可识别细菌抗生素耐药岛的位置和结构。
Nat Biotechnol. 2015 Mar;33(3):296-300. doi: 10.1038/nbt.3103. Epub 2014 Dec 8.
3
Resolving the complexity of the human genome using single-molecule sequencing.
MetaTrass:一种通过共条形码测序读数对人类肠道微生物组进行高质量宏基因组组装的工具。
Imeta. 2022 Aug 15;1(4):e46. doi: 10.1002/imt2.46. eCollection 2022 Dec.
4
Hybridization of Atlantic puffins in the Arctic coincides with 20th-century climate change.北极地区大西洋海鹦的杂交与 20 世纪的气候变化相吻合。
Sci Adv. 2023 Oct 6;9(40):eadh1407. doi: 10.1126/sciadv.adh1407.
5
Co-evolution of gene copy number and structural complexity in endocrine therapy resistant prostate cancer.内分泌治疗抵抗性前列腺癌中基因拷贝数与结构复杂性的共同进化
NAR Cancer. 2023 Aug 24;5(3):zcad045. doi: 10.1093/narcan/zcad045. eCollection 2023 Sep.
6
Genome graphs detect human polymorphisms in active epigenomic state during influenza infection.基因组图谱可检测流感感染期间处于活跃表观基因组状态的人类多态性。
Cell Genom. 2023 Apr 7;3(5):100294. doi: 10.1016/j.xgen.2023.100294. eCollection 2023 May 10.
7
Caecilian Genomes Reveal the Molecular Basis of Adaptation and Convergent Evolution of Limblessness in Snakes and Caecilians.蚓螈基因组揭示了蛇和蚓螈无肢状态适应和趋同进化的分子基础。
Mol Biol Evol. 2023 May 2;40(5). doi: 10.1093/molbev/msad102.
8
Fourth Report on Chicken Genes and Chromosomes 2022.《2022年鸡基因与染色体第四次报告》
Cytogenet Genome Res. 2022;162(8-9):405-528. doi: 10.1159/000529376. Epub 2023 Jan 30.
9
FarGen: Elucidating the distribution of coding variants in the isolated population of the Faroe Islands.法罗群岛人群中的编码变异分布解析。
Eur J Hum Genet. 2023 Mar;31(3):329-337. doi: 10.1038/s41431-022-01227-2. Epub 2022 Nov 21.
10
A high-quality, long-read genome assembly of the endangered ring-tailed lemur (Lemur catta).濒危环尾狐猴(Lemur catta)的高质量、长读长基因组组装。
Gigascience. 2022 Apr 1;11. doi: 10.1093/gigascience/giac026.
利用单分子测序解析人类基因组的复杂性。
Nature. 2015 Jan 29;517(7536):608-11. doi: 10.1038/nature13907. Epub 2014 Nov 10.
4
Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing.通过保持相邻性的转座和组合索引进行单倍型解析的全基因组测序。
Nat Genet. 2014 Dec;46(12):1343-9. doi: 10.1038/ng.3119. Epub 2014 Oct 19.
5
Comprehensive variation discovery in single human genomes.单个人类基因组中的全面变异发现。
Nat Genet. 2014 Dec;46(12):1350-5. doi: 10.1038/ng.3121. Epub 2014 Oct 19.
6
Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability.回文GOLGA8核心重复子促进15号染色体15q13.3微缺失和进化不稳定性。
Nat Genet. 2014 Dec;46(12):1293-302. doi: 10.1038/ng.3120. Epub 2014 Oct 19.
7
Refining analyses of copy number variation identifies specific genes associated with developmental delay.对拷贝数变异分析的细化鉴定出与发育迟缓相关的特定基因。
Nat Genet. 2014 Oct;46(10):1063-71. doi: 10.1038/ng.3092. Epub 2014 Sep 14.
8
Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.Illumina TruSeq合成长读段技术助力从头组装,并解析复杂的、高度重复的转座元件。
PLoS One. 2014 Sep 4;9(9):e106689. doi: 10.1371/journal.pone.0106689. eCollection 2014.
9
Whole-genome haplotyping using long reads and statistical methods.使用长读段和统计方法进行全基因组单倍型分型。
Nat Biotechnol. 2014 Mar;32(3):261-266. doi: 10.1038/nbt.2833. Epub 2014 Feb 23.
10
Reconstructing complex regions of genomes using long-read sequencing technology.使用长读长测序技术重建基因组的复杂区域。
Genome Res. 2014 Apr;24(4):688-96. doi: 10.1101/gr.168450.113. Epub 2014 Jan 13.