Suppr超能文献

DiscoverY:一种用于鉴定男性基因组中 Y 染色体序列的分类器。

DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies.

机构信息

Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.

Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA.

出版信息

BMC Genomics. 2019 Aug 9;20(1):641. doi: 10.1186/s12864-019-5996-3.

Abstract

BACKGROUND

Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references.

RESULTS

We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female.

CONCLUSION

DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect.

摘要

背景

尽管 Y 染色体在男性性别决定和生育中起着重要作用,但由于其单倍体和重复性质,目前对其研究不足。从全基因组组装中分离出 Y 染色体特异性连续序列的方法大致可分为两类。第一种方法涉及使用与女性的比例共享来检索 Y 染色体连续序列,但如果没有高质量、完整的女性参考,这种策略容易出现假阳性。第二种策略使用来自男性和女性读数的覆盖深度的比例来选择 Y 染色体连续序列,但这种方法需要对女性进行高深度测序,并且不能利用现有的女性参考。

结果

我们开发了一种基于 k-mer 的方法,称为 DiscoverY,该方法将与女性的比例共享与来自男性读数的覆盖深度结合起来,将连续序列分类为 Y 染色体。我们在人类和大猩猩基因组上评估了 DiscoverY 的性能,涵盖了不同的测序平台,包括 Illumina、10X 和 PacBio。在男性和女性数据质量较高的情况下,DiscoverY 具有较高的精度和召回率,并优于现有方法。对于没有高质量女性参考的情况,我们量化了使用草稿参考甚至只是女性原始测序读数的影响。

结论

DiscoverY 是一种从全基因组组装中分离出 Y 染色体特异性连续序列的有效方法。然而,与 X 染色体同源的区域仍然难以检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0819/6688218/f32036325563/12864_2019_5996_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验