Royal Botanic Gardens, Kew, Richmond, Surrey, UK.
Estonian University of Life Sciences, Tartu, Estonia.
Mol Ecol Resour. 2021 May;21(4):1037-1055. doi: 10.1111/1755-0998.13314. Epub 2021 Jan 9.
Obtaining informative data is the ambition of any genomic project, but in nonmodel species with very large genomes, pursuing such a goal requires surmounting a series of analytical challenges. Double-digest RAD sequencing is routinely used in nonmodel organisms and offers some control over the volume of data obtained. However, the volume of data recovered is not always an indication of the reliability of data sets, and quality checks are necessary to ensure that true and artefactual information is set apart. In the present study, we aim to fill the gap existing between the known applicability of RAD sequencing methods in plants with large genomes and the use of the retrieved loci for population genetic inference. By analysing two populations of Cypripedium calceolus, a nonmodel orchid species with a large genome size (1C ~ 31.6 Gbp), we provide a complete workflow from library preparation to bioinformatic filtering and inference of genetic diversity and differentiation. We show how filtering strategies to dismiss potentially misleading data need to be explored and adapted to data set-specific features. Moreover, we suggest that the occurrence of organellar sequences in libraries should not be neglected when planning the experiment and analysing the results. Finally, we explain how, in the absence of prior information about the genome of the species, seeking high standards of quality during library preparation and sequencing can provide an insurance against unpredicted technical or biological constraints.
获取有价值的数据是任何基因组计划的目标,但在基因组非常大的非模式物种中,要实现这一目标需要克服一系列分析挑战。双酶切 RAD 测序通常用于非模式生物,并且可以在一定程度上控制获得的数据量。然而,数据量的多少并不总是数据集可靠性的指标,需要进行质量检查以确保真实信息和人为信息区分开来。在本研究中,我们旨在填补已知 RAD 测序方法在基因组较大的植物中的适用性与从检索到的基因座推断种群遗传之间的空白。通过分析两个大花杓兰(Cypripedium calceolus)种群,这是一种基因组较大(1C~31.6 Gbp)的非模式兰花物种,我们提供了从文库制备到生物信息学过滤和遗传多样性及分化推断的完整工作流程。我们展示了如何探索和适应数据集特定特征的过滤策略来排除可能有误导性的数据。此外,我们还建议在规划实验和分析结果时,不应忽视文库中细胞器序列的存在。最后,我们解释了在没有关于物种基因组的先验信息的情况下,在文库制备和测序过程中寻求高标准的质量如何可以为不可预见的技术或生物学限制提供保障。