Graduate School of Agriculture, Kyoto University, Kyoto, Japan.
School of System Design and Technology, Tokyo Denki University, Tokyo, Japan.
BMC Bioinformatics. 2022 Jul 8;23(1):265. doi: 10.1186/s12859-022-04801-z.
Parentage information is fundamental to various life sciences. Recent advances in sequencing technologies have made it possible to accurately infer parentage even in non-model species. The optimization of sets of genome-wide markers is valuable for cost-effective applications but requires extremely large amounts of computation, which presses for the development of new efficient algorithms.
Here, for a closed half-sib population, we generalized the process of marker loci selection as a binary integer programming problem. The proposed systematic formulation considered marker localization and the family structure of the potential parental population, resulting in an accurate assignment with a small set of markers. We also proposed an efficient heuristic approach, which effectively improved the number of markers, localization, and tolerance to missing data of the set. Applying this method to the actual genotypes of apple (Malus × domestica) germplasm, we identified a set of 34 SNP markers that distinguished 300 potential parents crossed to a particular cultivar with a greater than 99% accuracy.
We present a novel approach for selecting informative markers based on binary integer programming. Since the data generated by high-throughput sequencing technology far exceeds the requirement for parentage assignment, a combination of the systematic marker selection with targeted SNP genotyping, such as KASP, allows flexibly enlarging the analysis up to a scale that has been unrealistic in various species. The method developed in this study can be directly applied to unsolved large-scale problems in breeding, reproduction, and ecological research, and is expected to lead to novel knowledge in various biological fields. The implementation is available at https://github.com/SoNishiyama/IP-SIMPAT .
亲权信息是各种生命科学的基础。测序技术的最新进展使得即使在非模式物种中,也可以准确推断亲权。优化基因组范围内的标记集对于具有成本效益的应用是有价值的,但需要极其大量的计算,这就需要开发新的高效算法。
在这里,对于封闭的半同胞群体,我们将标记位点选择的过程概括为一个二进制整数规划问题。所提出的系统公式考虑了标记的定位和潜在亲本群体的家族结构,从而可以用一小部分标记进行准确的分配。我们还提出了一种有效的启发式方法,有效地提高了标记的数量、定位和对缺失数据的容忍度。将该方法应用于苹果(Malus × domestica)种质的实际基因型,我们确定了一组 34 个 SNP 标记,可将 300 个潜在亲本与特定品种杂交,准确率大于 99%。
我们提出了一种基于二进制整数规划的新方法来选择信息标记。由于高通量测序技术产生的数据远远超过亲权分配的要求,因此可以将系统的标记选择与靶向 SNP 基因分型(如 KASP)相结合,灵活地将分析扩展到在各种物种中不切实际的规模。本研究中开发的方法可以直接应用于育种、繁殖和生态研究中未解决的大规模问题,并有望在各个生物领域产生新的知识。该实现可在 https://github.com/SoNishiyama/IP-SIMPAT 上获得。