Suppr超能文献

使用 DIYABC 随机森林将带监督机器学习的近似贝叶斯计算扩展到使用遗传多态性推断人口历史。

Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest.

机构信息

IMAG, Univ Montpellier, CNRS, UMR 5149, Montpellier, France.

ISA, INRAE, CNRS, Univ Côte d'Azur, Sophia Antipolis, France.

出版信息

Mol Ecol Resour. 2021 Nov;21(8):2598-2613. doi: 10.1111/1755-0998.13413. Epub 2021 May 21.

Abstract

Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.

摘要

基于模拟的方法,如近似贝叶斯计算 (ABC),非常适合分析种群和物种遗传历史的复杂情况。在这种情况下,监督机器学习 (SML) 方法为进行关于情景选择和参数估计的有效推断提供了有吸引力的统计解决方案。随机森林方法 (RF) 是一种强大的 SML 算法集合,用于分类或回归问题。随机森林允许以低计算成本进行推断,无需对 ABC 汇总统计数据的相关组件进行初步选择,并且可以绕过 ABC 容限水平的推导。我们已经实现了一组 RF 算法,以使用从扩展版本的 DIYABC v2.1.0 实现的种群遗传模拟器生成的模拟数据集进行推断。由此产生的计算机软件包,名为 DIYABC Random Forest v1.0,将两种功能集成到一个用户友好的界面中:在不同类型的分子数据(微卫星、DNA 序列或 SNPs)的自定义进化情景下进行模拟,以及包括统计工具在内的 RF 处理,用于评估推断的准确性和准确性。我们通过分析对应于池测序和个体测序 SNP 数据集的伪观测和真实数据集,说明了 DIYABC Random Forest v1.0 的功能,包括情景选择和参数估计。由于实施的 RF 方法的特性以及 SNP 数据中可用的大特征向量(包括各种汇总统计数据及其线性组合),DIYABC Random Forest v1.0 可以有效地用于分析大型 SNP 数据集,以便对复杂的种群遗传历史进行推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef68/8596733/8f96adcfc834/MEN-21-2598-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验