可能正确：使用短读长读来拯救重复序列。

Probably Correct: Rescuing Repeats with Short and Long Reads.

机构信息

Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic.

出版信息

Genes (Basel). 2020 Dec 31;12(1):48. doi: 10.3390/genes12010048.

DOI:10.3390/genes12010048

PMID:33396198

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7823596/

Abstract

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.

摘要

自人类基因组计划（human genome project）引入高通量测序以来，由于人类基因组中估计有 50-69%是重复的，将短读长组装成具有足够质量的参考序列成为一个重大问题。因此，相当一部分测序读长是多映射的，即没有在基因组中唯一定位。读长是否多映射的两个关键参数是读长和基因组复杂度。长读长现在能够跨越困难的异染色质区域，包括完整的着丝粒，并从“端粒到端粒”对染色体进行特征描述。此外，基于其表观遗传标记（如甲基化模式），可以区分相同的读长或重复数组，从而辅助组装过程。尽管如此，长读长仍然包含一定比例的测序错误，这会使对齐器和组装器在准确性和速度上都感到困惑。在这里，我回顾了针对重复分辨率和多映射读问题提出并实施的解决方案，以及参考选择、重复掩蔽和性染色体适当表示的下游后果。我还考虑了长读长的即将到来的挑战和解决方案，我们预计将从单个个体内部的重复定位问题转变为泛基因组内部的重复定位问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/075a/7823596/05fba14dfb49/genes-12-00048-g001.jpg

相似文献

Probably Correct: Rescuing Repeats with Short and Long Reads.可能正确：使用短读长读来拯救重复序列。

Genes (Basel). 2020 Dec 31;12(1):48. doi: 10.3390/genes12010048.

RF: a method for filtering short reads with tandem repeats for genome mapping.RF：一种用于基因组图谱构建的带有串联重复的短读过滤方法。

Genomics. 2013 Jul;102(1):35-7. doi: 10.1016/j.ygeno.2013.03.002. Epub 2013 Mar 29.

Centromere reference models for human chromosomes X and Y satellite arrays.人类X和Y染色体卫星阵列的着丝粒参考模型。

Genome Res. 2014 Apr;24(4):697-707. doi: 10.1101/gr.159624.113. Epub 2014 Feb 5.

Linear assembly of a human centromere on the Y chromosome.线性组装人类着丝粒于 Y 染色体上。

Nat Biotechnol. 2018 Apr;36(4):321-323. doi: 10.1038/nbt.4109. Epub 2018 Mar 19.

Assessing the impact of exact reads on reducing the error rate of read mapping.评估精确读取对降低读取映射错误率的影响。

BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.

A sensitive repeat identification framework based on short and long reads.基于短读长读的敏感重复序列识别框架。

Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563.

Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome.单分子实时测序结合光学图谱生成完全完成的真菌基因组

mBio. 2015 Aug 18;6(4):e00936-15. doi: 10.1128/mBio.00936-15.

Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组：以白千层作为测试案例的方法比较。

BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.

High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads.利用纳米孔和高保真长读长进行高质量拟南芥基因组组装

Genomics Proteomics Bioinformatics. 2022 Feb;20(1):4-13. doi: 10.1016/j.gpb.2021.08.003. Epub 2021 Sep 3.

Benchmarking multi-platform sequencing technologies for human genome assembly.多平台测序技术在人类基因组组装中的基准测试。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad300.

引用本文的文献

Maptcha: an efficient parallel workflow for hybrid genome scaffolding.Maptcha：一种用于混合基因组支架构建的高效并行工作流程。

BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.

Fragile sites, chromosomal lesions, tandem repeats, and disease.脆性位点、染色体病变、串联重复序列与疾病。

Front Genet. 2022 Nov 17;13:985975. doi: 10.3389/fgene.2022.985975. eCollection 2022.

Satellite DNAs and human sex chromosome variation.卫星 DNA 与人类性染色体变异。

Semin Cell Dev Biol. 2022 Aug;128:15-25. doi: 10.1016/j.semcdb.2022.04.022. Epub 2022 May 27.

Variation and Evolution of Human Centromeres: A Field Guide and Perspective.人类着丝粒的变异和进化：一个指南和视角。

Annu Rev Genet. 2021 Nov 23;55:583-602. doi: 10.1146/annurev-genet-071719-020519.

本文引用的文献

The structure, function and evolution of a complete human chromosome 8.完整人类 8 号染色体的结构、功能与进化

Nature. 2021 May;593(7857):101-107. doi: 10.1038/s41586-021-03420-7. Epub 2021 Apr 7.

Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.使用带有 hifiasm 的相定装配图进行单体型解析从头组装。

Nat Methods. 2021 Feb;18(2):170-175. doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.

Efficient hybrid de novo assembly of human genomes with WENGAN.使用 WENGAN 进行高效的人类基因组从头杂交组装。

Nat Biotechnol. 2021 Apr;39(4):422-430. doi: 10.1038/s41587-020-00747-w. Epub 2020 Dec 14.

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.利用单细胞测序和长读长技术进行全相基因组组装，无需父母数据。

Nat Biotechnol. 2021 Mar;39(3):302-308. doi: 10.1038/s41587-020-0719-5. Epub 2020 Dec 7.

Chromosome-scale, haplotype-resolved assembly of human genomes.人类基因组的染色体规模、单倍型解析组装。

Nat Biotechnol. 2021 Mar;39(3):309-312. doi: 10.1038/s41587-020-0711-0. Epub 2020 Dec 7.

Chromosome-scale genome assembly for the duckweed Spirodela intermedia, integrating cytogenetic maps, PacBio and Oxford Nanopore libraries.浮萍中间品系的染色体水平基因组组装，整合了细胞遗传图谱、PacBio和牛津纳米孔文库。

Sci Rep. 2020 Nov 5;10(1):19230. doi: 10.1038/s41598-020-75728-9.

The design and construction of reference pangenome graphs with minigraph.使用 Minigraph 设计和构建参考泛基因组图谱。

Genome Biol. 2020 Oct 16;21(1):265. doi: 10.1186/s13059-020-02168-z.

Dynamic evolution of great ape Y chromosomes.巨猿 Y 染色体的动态进化。

Proc Natl Acad Sci U S A. 2020 Oct 20;117(42):26273-26280. doi: 10.1073/pnas.2001749117. Epub 2020 Oct 5.

A diploid assembly-based benchmark for variants in the major histocompatibility complex.基于二倍体组装的主要组织相容性复合体变异基准

Nat Commun. 2020 Sep 22;11(1):4794. doi: 10.1038/s41467-020-18564-9.

An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice.重复基因检测方法概述：选择重复基因检测方法时为何必须考虑重复机制。

Genes (Basel). 2020 Sep 4;11(9):1046. doi: 10.3390/genes11091046.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

可能正确：使用短读长读来拯救重复序列。

Probably Correct: Rescuing Repeats with Short and Long Reads.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献