Cabrera Andrea A, Rey-Iglesia Alba, Louis Marie, Skovrind Mikkel, Westbury Michael V, Lorenzen Eline D
Globe Institute University of Copenhagen Copenhagen K Denmark.
Greenland Institute of Natural Resources Nuuk Greenland.
Ecol Evol. 2022 Aug 25;12(8):e9185. doi: 10.1002/ece3.9185. eCollection 2022 Aug.
Accurate sex identification is crucial for elucidating the biology of a species. In the absence of directly observable sexual characteristics, sex identification of wild fauna can be challenging, if not impossible. Molecular sexing offers a powerful alternative to morphological sexing approaches. Here, we present SeXY, a novel sex-identification pipeline, for very low-coverage shotgun sequencing data from a single individual. SeXY was designed to utilize low-effort screening data for sex identification and does not require a conspecific sex-chromosome assembly as reference. We assess the accuracy of our pipeline to data quantity by downsampling sequencing data from 100,000 to 1000 mapped reads and to reference genome selection by mapping to a variety of reference genomes of various qualities and phylogenetic distance. We show that our method is 100% accurate when mapping to a high-quality (highly contiguous N50 > 30 Mb) conspecific genome, even down to 1000 mapped reads. For lower-quality reference assemblies (N50 < 30 Mb), our method is 100% accurate with 50,000 mapped reads, regardless of reference assembly quality or phylogenetic distance. The SeXY pipeline provides several advantages over previously implemented methods; SeXY (i) requires sequencing data from only a single individual, (ii) does not require assembled conspecific sex chromosomes, or even a conspecific reference assembly, (iii) takes into account variation in coverage across the genome, and (iv) is accurate with only 1000 mapped reads in many cases.
准确的性别鉴定对于阐明物种生物学特性至关重要。在缺乏直接可观察到的性特征的情况下,野生 fauna 的性别鉴定即使并非不可能,也可能具有挑战性。分子性别鉴定为形态学性别鉴定方法提供了一种强大的替代方法。在这里,我们提出了 SeXY,一种新颖的性别鉴定流程,用于单个个体的极低覆盖度鸟枪法测序数据。SeXY 的设计目的是利用低工作量的筛选数据进行性别鉴定,并且不需要将同种性染色体组装作为参考。我们通过将测序数据从 100,000 个映射读数下采样到 1000 个映射读数来评估我们的流程对数据量的准确性,并通过映射到各种质量和系统发育距离的不同参考基因组来评估对参考基因组选择的准确性。我们表明,当映射到高质量(高度连续的 N50 > 30 Mb)的同种基因组时,即使低至 1000 个映射读数,我们的方法也是 100%准确的。对于质量较低的参考组装(N50 < 30 Mb),无论参考组装质量或系统发育距离如何,我们的方法在 50,000 个映射读数时都是 100%准确的。与先前实施的方法相比,SeXY 流程具有几个优点;SeXY(i)仅需要来自单个个体的测序数据,(ii)不需要组装的同种性染色体,甚至不需要同种参考组装,(iii)考虑了整个基因组覆盖度的变化,并且(iv)在许多情况下仅需 1000 个映射读数就准确。