Suppr超能文献

RecoverY:基于 k-mer 的读分类方法,用于 Y 染色体特异性测序和组装。

RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly.

机构信息

Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.

CNRS, CRIStAL, 59655 Villeneuve d'Ascq, France.

出版信息

Bioinformatics. 2018 Apr 1;34(7):1125-1131. doi: 10.1093/bioinformatics/btx771.

Abstract

MOTIVATION

The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. As the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y. However, the strategy required the manual setting of key parameters, a time-consuming process leading to sub-optimal assemblies.

RESULTS

We develop a method, RecoverY, that selects Y-specific reads by automatically choosing the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used by Tomaszkiewicz et al., we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection.

AVAILABILITY AND IMPLEMENTATION

Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverY.

CONTACT

kmakova@bx.psu.edu or pashadag@cse.psu.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

由于高度重复的序列和单倍体的性质导致的低深度,单倍体哺乳动物的 Y 染色体在基因组组装中通常代表性不足。一种改善 Y 序列低覆盖度的策略是在组装前通过实验富集 Y 特异性材料。由于富集过程并不完美,因此需要算法在下游组装之前识别可能的 Y 特异性读取。使用 k-mer 丰度来识别此类读取的策略被用于组装大猩猩 Y 染色体。然而,该策略需要手动设置关键参数,这是一个耗时的过程,导致组装结果不理想。

结果

我们开发了一种名为 RecoverY 的方法,该方法通过自动选择 k-mer 被认为来自 Y 的丰度水平来选择 Y 特异性读取。该算法使用了来自相关物种的 Y 染色体或已知的 Y 转录本序列的先验知识。我们在人类和大猩猩的模拟和真实数据上评估了 RecoverY,并研究了其对重要参数的稳健性。我们表明,RecoverY 导致的组装质量远远优于过滤读取或 contigs 的替代策略。与 Tomaszkiewicz 等人使用的初步策略相比,我们在组装大小上提高了 33%,在 NG50 上提高了 20%,这证明了自动参数选择的强大功能。

可用性和实现

我们的工具 RecoverY 可在 https://github.com/makovalab-psu/RecoverY 上免费获得。

联系方式

kmakova@bx.psu.edupashadag@cse.psu.edu

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

5
Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.
6
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.
7
ARCS: scaffolding genome drafts with linked reads.ARCS:使用链接读取构建基因组草图。
Bioinformatics. 2018 Mar 1;34(5):725-731. doi: 10.1093/bioinformatics/btx675.

引用本文的文献

1
2
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
4
Satellite DNAs and human sex chromosome variation.卫星 DNA 与人类性染色体变异。
Semin Cell Dev Biol. 2022 Aug;128:15-25. doi: 10.1016/j.semcdb.2022.04.022. Epub 2022 May 27.
8
How to identify sex chromosomes and their turnover.如何识别性染色体及其易位。
Mol Ecol. 2019 Nov;28(21):4709-4724. doi: 10.1111/mec.15245. Epub 2019 Oct 10.

本文引用的文献

1
KMC 3: counting and manipulating k-mer statistics.KMC 3:计算和处理k-mer统计信息。
Bioinformatics. 2017 Sep 1;33(17):2759-2761. doi: 10.1093/bioinformatics/btx304.
2
Y and W Chromosome Assemblies: Approaches and Discoveries.Y 和 W 染色体组装:方法与发现。
Trends Genet. 2017 Apr;33(4):266-282. doi: 10.1016/j.tig.2017.01.008. Epub 2017 Feb 22.
4
The pig X and Y Chromosomes: structure, sequence, and evolution.猪的X和Y染色体:结构、序列及进化
Genome Res. 2016 Jan;26(1):130-9. doi: 10.1101/gr.188839.114. Epub 2015 Nov 11.
7
Comprehensive variation discovery in single human genomes.单个人类基因组中的全面变异发现。
Nat Genet. 2014 Dec;46(12):1350-5. doi: 10.1038/ng.3121. Epub 2014 Oct 19.
10
Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验