Suppr超能文献

机器学习作为一种有效方法,用于鉴定多倍体植物中的真正单核苷酸多态性。

Machine Learning as an Effective Method for Identifying True Single Nucleotide Polymorphisms in Polyploid Plants.

出版信息

Plant Genome. 2019 Mar;12(1). doi: 10.3835/plantgenome2018.05.0023.

Abstract

Single nucleotide polymorphisms (SNPs) have many advantages as molecular markers since they are ubiquitous and codominant. However, the discovery of true SNPs in polyploid species is difficult. Peanut ( L.) is an allopolyploid, which has a very low rate of true SNP calling. A large set of true and false SNPs identified from the Axiom_ 58k array was leveraged to train machine-learning models to enable identification of true SNPs directly from sequence data to reduce ascertainment bias. These models achieved accuracy rates above 80% using real peanut RNA sequencing (RNA-seq) and whole-genome shotgun (WGS) resequencing data, which is higher than previously reported for polyploids and at least a twofold improvement for peanut. A 48K SNP array, Axiom_2, was designed using this approach resulting in 75% accuracy of calling SNPs from different tetraploid peanut genotypes. Using the method to simulate SNP variation in several polyploids, models achieved >98% accuracy in selecting true SNPs. Additionally, models built with simulated genotypes were able to select true SNPs at >80% accuracy using real peanut data. This work accomplished the objective to create an effective approach for calling highly reliable SNPs from polyploids using machine learning. A novel tool was developed for predicting true SNPs from sequence data, designated as SNP machine learning (SNP-ML), using the described models. The SNP-ML additionally provides functionality to train new models not included in this study for customized use, designated SNP machine learner (SNP-MLer). The SNP-ML is publicly available.

摘要

单核苷酸多态性(SNPs)作为分子标记具有许多优势,因为它们普遍存在且为共显性。然而,在多倍体物种中发现真正的 SNPs 是很困难的。花生(L.)是一种异源多倍体,其真正 SNP 的发现率非常低。利用从 Axiom_58k 阵列中鉴定出的大量真实和虚假 SNPs,训练机器学习模型,以便直接从序列数据中识别真正的 SNPs,从而减少确定偏差。这些模型使用真实的花生 RNA 测序(RNA-seq)和全基因组鸟枪法(WGS)重测序数据实现了 80%以上的准确率,高于之前报道的多倍体,并且至少提高了花生的两倍。使用这种方法设计了 48K SNP 阵列 Axiom_2,从不同的四倍体花生基因型中调用 SNP 的准确率达到 75%。使用该方法模拟几种多倍体中的 SNP 变异,模型在选择真正的 SNP 时准确率达到>98%。此外,使用真实的花生数据,使用模拟基因型构建的模型能够以>80%的准确率选择真正的 SNP。这项工作实现了使用机器学习从多倍体中调用高度可靠的 SNPs 的有效方法的目标。开发了一种新的工具,用于从序列数据中预测真正的 SNP,称为 SNP 机器学习(SNP-ML),使用描述的模型。SNP-ML 还提供了功能,可以为未包含在本研究中的新模型进行定制化训练,指定为 SNP 机器学习器(SNP-MLer)。SNP-ML 是公开可用的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验