Suppr超能文献

可能正确:使用短读长读来拯救重复序列。

Probably Correct: Rescuing Repeats with Short and Long Reads.

机构信息

Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic.

出版信息

Genes (Basel). 2020 Dec 31;12(1):48. doi: 10.3390/genes12010048.

Abstract

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.

摘要

自人类基因组计划(human genome project)引入高通量测序以来,由于人类基因组中估计有 50-69%是重复的,将短读长组装成具有足够质量的参考序列成为一个重大问题。因此,相当一部分测序读长是多映射的,即没有在基因组中唯一定位。读长是否多映射的两个关键参数是读长和基因组复杂度。长读长现在能够跨越困难的异染色质区域,包括完整的着丝粒,并从“端粒到端粒”对染色体进行特征描述。此外,基于其表观遗传标记(如甲基化模式),可以区分相同的读长或重复数组,从而辅助组装过程。尽管如此,长读长仍然包含一定比例的测序错误,这会使对齐器和组装器在准确性和速度上都感到困惑。在这里,我回顾了针对重复分辨率和多映射读问题提出并实施的解决方案,以及参考选择、重复掩蔽和性染色体适当表示的下游后果。我还考虑了长读长的即将到来的挑战和解决方案,我们预计将从单个个体内部的重复定位问题转变为泛基因组内部的重复定位问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/075a/7823596/05fba14dfb49/genes-12-00048-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验