Core Hunter II：基于混合副本搜索的多种遗传多样性度量的快速核心子集选择。

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search.

机构信息

Department of Applied Mathematics and Computer Science, Faculty of Sciences, Ghent University, Krijgslaan 281, S9, 9000 Gent, Belgium.

出版信息

BMC Bioinformatics. 2012 Nov 23;13:312. doi: 10.1186/1471-2105-13-312.

DOI:10.1186/1471-2105-13-312

PMID:23174036

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3554476/

Abstract

BACKGROUND

Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times.

RESULTS

Our results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC.

CONCLUSION

The REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn't always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.

摘要

背景

从遗传资源中采样核心子集，同时尽可能保持原始集合的遗传多样性，这是基因库管理者的一项重要但计算复杂的任务。Core Hunter 计算机程序是作为一种工具开发的，用于根据多种遗传度量（包括距离度量和等位基因多样性指数）生成这样的子集。首先，我们研究了使用最小距离度量（而不是默认的平均值）对 Core Hunter 性能的影响。其次，我们尝试通过与几种使用不同大小和等位基因组成的真实数据集的其他启发式方法进行比较，更深入地了解原始 Core Hunter 搜索算法的性能。最后，我们提出了一种新的算法（混合副本搜索）用于 Core Hunter II，旨在提高构建的核心集的多样性及其对应的生成时间。

结果

我们的结果表明，引入最小距离度量会导致核心集中的所有个体彼此之间足够远，这在单独优化平均值距离时并不总是可以得到。原始 Core Hunter 算法、Replica Exchange Monte Carlo (REMC)与更简单的启发式算法的比较表明，更简单的算法通常会产生非常好的结果，但运行时间比 REMC 短。然而，在采样强度较低的情况下，更简单的算法的性能略逊于 REMC，并且某些启发式算法在处理最小距离度量时明显存在困难。相比之下，新的高级混合副本搜索算法（MixRep），它使用异质副本，能够以与 REMC 和更简单的启发式算法相同或更高的多样性分数采样核心集，通常使用比 REMC 更少的计算时间。

结论

原始 Core Hunter 计算机程序中使用的 REMC 搜索算法性能良好，有时会导致比一些更简单的方法略好的结果，尽管它并不总是给出最佳结果。通过切换到新的混合副本算法，整体结果和运行时间可以得到显著改善。最后，我们建议在寻找彼此之间足够远的所有个体的核心集时，将最小距离度量纳入目标函数中。Core Hunter II 可作为开源项目在 http://www.corehunter.org 上免费获得。

相似文献

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search.Core Hunter II：基于混合副本搜索的多种遗传多样性度量的快速核心子集选择。

BMC Bioinformatics. 2012 Nov 23;13:312. doi: 10.1186/1471-2105-13-312.

Core Hunter 3: flexible core subset selection.Core Hunter 3：灵活的核心子集选择。

BMC Bioinformatics. 2018 May 31;19(1):203. doi: 10.1186/s12859-018-2209-z.

Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures.核心猎手：一种基于多种遗传指标对遗传资源进行采样的算法。

BMC Bioinformatics. 2009 Aug 6;10:243. doi: 10.1186/1471-2105-10-243.

Comparing three stochastic search algorithms for computational protein design: Monte Carlo, replica exchange Monte Carlo, and a multistart, steepest-descent heuristic.比较三种用于计算蛋白质设计的随机搜索算法：蒙特卡罗法、复制交换蒙特卡罗法和多起点、最陡下降启发式算法。

J Comput Chem. 2016 Jul 15;37(19):1781-93. doi: 10.1002/jcc.24393. Epub 2016 May 20.

Using the multi-objective optimization replica exchange Monte Carlo enhanced sampling method for protein-small molecule docking.使用多目标优化副本交换蒙特卡罗增强采样方法进行蛋白质-小分子对接。

BMC Bioinformatics. 2017 Jul 10;18(1):327. doi: 10.1186/s12859-017-1733-6.

An Evolutionary Profile Guided Greedy Parallel Replica-Exchange Monte Carlo Search Algorithm for Rapid Convergence in Protein Design.一种进化特征指导的贪婪并行副本交换蒙特卡罗搜索算法，用于蛋白质设计中的快速收敛。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):489-499. doi: 10.1109/TCBB.2019.2928809. Epub 2021 Apr 8.

PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets.PowerCore：一个应用先进M策略并通过启发式搜索来建立核心集的程序。

Bioinformatics. 2007 Aug 15;23(16):2155-62. doi: 10.1093/bioinformatics/btm313. Epub 2007 Jun 22.

Hamiltonian replica exchange simulations of glucose oxidase adsorption on charged surfaces.带电荷表面上葡萄糖氧化酶吸附的哈密顿复制交换模拟。

Phys Chem Chem Phys. 2018 May 30;20(21):14587-14596. doi: 10.1039/c8cp00530c.

McGenus: a Monte Carlo algorithm to predict RNA secondary structures with pseudoknots.McGenus：一种用于预测具有假结的 RNA 二级结构的 Monte Carlo 算法。

Nucleic Acids Res. 2013 Feb 1;41(3):1895-900. doi: 10.1093/nar/gks1204. Epub 2012 Dec 16.

Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.在表达谱分析中交互式优化信噪比：Affymetrix微阵列中特定项目的算法选择和检测p值加权

Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.

引用本文的文献

Genetic Diversity and Construction of Salt-Tolerant Core Germplasm in Maize ( L.) Based on Phenotypic Traits and SNP Markers.基于表型性状和SNP标记的玉米遗传多样性及耐盐核心种质构建

Plants (Basel). 2025 Jul 14;14(14):2182. doi: 10.3390/plants14142182.

Genetic Diversity Analysis and Core Germplasm Collection Construction of Tartary Buckwheat Based on SSR Markers.基于SSR标记的苦荞遗传多样性分析与核心种质库构建

Plants (Basel). 2025 Mar 3;14(5):771. doi: 10.3390/plants14050771.

Genetic Diversity Analysis and Core Collection Construction of Ancient L. Using SSR Markers.基于SSR标记的古代L.的遗传多样性分析与核心种质库构建

Int J Mol Sci. 2024 Nov 28;25(23):12776. doi: 10.3390/ijms252312776.

Revealing the Genetic Diversity and Population Structure of Garlic Resource Cultivars and Screening of Core Cultivars Based on Specific Length Amplified Fragment Sequencing (SLAF-Seq).揭示大蒜资源品种的遗传多样性和种群结构，并基于特异长度扩增片段测序（SLAF-Seq）筛选核心品种。

Genes (Basel). 2024 Aug 28;15(9):1135. doi: 10.3390/genes15091135.

Genome analyses reveal population structure and a purple stigma color gene candidate in finger millet.基因组分析揭示了手指粟的种群结构和一个紫色柱头颜色基因候选。

Nat Commun. 2023 Jun 21;14(1):3694. doi: 10.1038/s41467-023-38915-6.

Identification of superior parents with high fiber quality using molecular markers and phenotypes based on a core collection of upland cotton ( L.).基于陆地棉核心种质利用分子标记和表型鉴定高纤维品质的优良亲本

Mol Breed. 2022 Jun 10;42(6):30. doi: 10.1007/s11032-022-01300-0. eCollection 2022 Jun.

Genetic Diversity Analysis and Core Germplasm Collection Construction of Radish Cultivars Based on Structure Variation Markers.基于结构变异标记的萝卜品种遗传多样性分析与核心种质资源构建。

Int J Mol Sci. 2023 Jan 29;24(3):2554. doi: 10.3390/ijms24032554.

CGIAR Barley Breeding Toolbox: A diversity panel to facilitate breeding and genomic research in the developing world.国际农业研究磋商组织大麦育种工具箱：一个促进发展中世界育种和基因组研究的多样性群体。

Front Plant Sci. 2022 Nov 14;13:1034322. doi: 10.3389/fpls.2022.1034322. eCollection 2022.

Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber.基于图的泛基因组揭示了与黄瓜农艺性状和驯化相关的结构和序列变异。

Nat Commun. 2022 Feb 3;13(1):682. doi: 10.1038/s41467-022-28362-0.

Genome-wide approaches delineate the additive, epistatic, and pleiotropic nature of variants controlling fatty acid composition in peanut (Arachis hypogaea L.).全基因组分析方法描绘了控制花生（Arachis hypogaea L.）脂肪酸组成的变异的加性、上位性和多效性。

G3 (Bethesda). 2022 Jan 4;12(1). doi: 10.1093/g3journal/jkab382.

本文引用的文献

Genetic diversity of cultivated flax (Linum usitatissimum L.) germplasm assessed by retrotransposon-based markers.基于反转录转座子标记的栽培亚麻（Linum usitatissimum L.）种质遗传多样性评估。

Theor Appl Genet. 2011 May;122(7):1385-97. doi: 10.1007/s00122-011-1539-2. Epub 2011 Feb 4.

The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon based insertion polymorphism (RBIP) marker analysis.利用高通量反转录转座子基于插入多态性（RBIP）标记分析研究野豌豆的遗传多样性和进化。

BMC Evol Biol. 2010 Feb 15;10:44. doi: 10.1186/1471-2148-10-44.

Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures.核心猎手：一种基于多种遗传指标对遗传资源进行采样的算法。

BMC Bioinformatics. 2009 Aug 6;10:243. doi: 10.1186/1471-2105-10-243.

Genetic diversity and population structure of pea (Pisum sativum L.) varieties derived from combined retrotransposon, microsatellite and morphological marker analysis.基于反转录转座子、微卫星和形态学标记联合分析的豌豆（Pisum sativum L.）品种的遗传多样性与群体结构

Theor Appl Genet. 2008 Aug;117(3):413-24. doi: 10.1007/s00122-008-0785-4. Epub 2008 May 27.

Optimization by simulated annealing.模拟退火优化。

Science. 1983 May 13;220(4598):671-80. doi: 10.1126/science.220.4598.671.

PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets.PowerCore：一个应用先进M策略并通过启发式搜索来建立核心集的程序。

Bioinformatics. 2007 Aug 15;23(16):2155-62. doi: 10.1093/bioinformatics/btm313. Epub 2007 Jun 22.

A strategy on constructing core collections by least distance stepwise sampling.一种基于最小距离逐步抽样构建核心种质库的策略。

Theor Appl Genet. 2007 Jun;115(1):1-8. doi: 10.1007/s00122-007-0533-1. Epub 2007 Apr 3.

Genetic distance sampling: a novel sampling method for obtaining core collections using genetic distances with an application to cultivated lettuce.遗传距离抽样：一种利用遗传距离获取核心种质的新型抽样方法及其在栽培生菜中的应用

Theor Appl Genet. 2007 Feb;114(3):421-8. doi: 10.1007/s00122-006-0433-9. Epub 2006 Dec 16.

MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness.MSTRAT：一种通过最大化等位基因或表型丰富度来构建种质核心库的算法。

J Hered. 2001 Jan-Feb;92(1):93-4. doi: 10.1093/jhered/92.1.93.

Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers.通过评估遗传标记有助于保护野生作物近缘种的等位基因丰富度。

Proc Natl Acad Sci U S A. 1993 Nov 15;90(22):10623-7. doi: 10.1073/pnas.90.22.10623.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Core Hunter II：基于混合副本搜索的多种遗传多样性度量的快速核心子集选择。

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献