Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France.
INRAE, Unité de Mathématiques et Informatique Appliquées - Toulouse, Toulouse, France.
BMC Bioinformatics. 2022 Nov 19;23(1):499. doi: 10.1186/s12859-022-05045-7.
Genotyping and sequencing technologies produce increasingly large numbers of genetic markers with potentially high rates of missing or erroneous data. Therefore, the construction of linkage maps is more and more complex. Moreover, the size of segregating populations remains constrained by cost issues and is less and less commensurate with the numbers of SNPs available. Thus, guaranteeing a statistically robust marker order requires that maps include only a carefully selected subset of SNPs.
In this context, the SeSAM software allows automatic genetic map construction using seriation and placement approaches, to produce (1) a high-robustness framework map which includes as many markers as possible while keeping the order robustness beyond a given statistical threshold, and (2) a high-density total map including the framework plus almost all polymorphic markers. During this process, care is taken to limit the impact of genotyping errors and of missing data on mapping quality. SeSAM can be used with a wide range of biparental populations including from outcrossing species for which phases are inferred on-the-fly by maximum-likelihood during map elongation. The package also includes functions to simulate data sets, convert data formats, detect putative genotyping errors, visualize data and map quality (including graphical genotypes), and merge several maps into a consensus. SeSAM is also suitable for interactive map construction, by providing lower-level functions for 2-point and multipoint EM analyses. The software is implemented in a R package including functions in C++.
SeSAM is a fully automatic linkage mapping software designed to (1) produce a framework map as robust as desired by optimizing the selection of a subset of markers, and (2) produce a high-density map including almost all polymorphic markers. The software can be used with a wide range of biparental mapping populations including cases from outcrossing. SeSAM is freely available under a GNU GPL v3 license and works on Linux, Windows, and macOS platforms. It can be downloaded together with its user-manual and quick-start tutorial from ForgeMIA (SeSAM project) at https://forgemia.inra.fr/gqe-acep/sesam/-/releases.
基因分型和测序技术产生了越来越多的遗传标记,这些标记可能具有较高的缺失或错误数据率。因此,连锁图谱的构建越来越复杂。此外,由于成本问题,分离群体的规模仍然受到限制,并且越来越不符合可用 SNP 的数量。因此,保证标记顺序具有统计学稳健性需要图谱仅包含经过精心选择的 SNP 子集。
在这种情况下,SeSAM 软件允许使用序列化和放置方法自动构建遗传图谱,以生成(1)高稳健性框架图谱,该图谱尽可能多地包含标记,同时保持在给定统计阈值之外的顺序稳健性,以及(2)包括框架和几乎所有多态性标记的高密度总图谱。在此过程中,将特别注意限制基因分型错误和缺失数据对图谱质量的影响。SeSAM 可用于多种双亲群体,包括通过最大似然法在图谱延伸过程中实时推断相位的异交物种。该软件包还包括用于模拟数据集、转换数据格式、检测潜在基因分型错误、可视化数据和图谱质量(包括图形基因型)以及将多个图谱合并到共识中的功能。SeSAM 也适合交互式图谱构建,通过提供用于 2 点和多点 EM 分析的较低级别功能来实现。该软件以包括 C++函数的 R 包实现。
SeSAM 是一种完全自动的连锁图谱绘制软件,旨在(1)通过优化标记子集的选择来生成所需稳健性的框架图谱,以及(2)生成包括几乎所有多态性标记的高密度图谱。该软件可用于多种双亲图谱绘制群体,包括异交情况。SeSAM 根据 GNU GPL v3 许可证免费提供,可在 Linux、Windows 和 macOS 平台上运行。可以从 ForgeMIA(SeSAM 项目)https://forgemia.inra.fr/gqe-acep/sesam/-/releases 下载它及其用户手册和快速入门教程。