Suppr超能文献

决策树算法生成的用于38种十字花科物种标记的基因单核苷酸多态性条形码

Decision Tree Algorithm-Generated Single-Nucleotide Polymorphism Barcodes of Genes for 38 Brassicaceae Species Tagging.

作者信息

Yang Cheng-Hong, Wu Kuo-Chuan, Chuang Li-Yeh, Chang Hsueh-Wei

机构信息

Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.

Graduate Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.

出版信息

Evol Bioinform Online. 2018 Mar 5;14:1176934318760856. doi: 10.1177/1176934318760856. eCollection 2018.

Abstract

DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ibulose diphosphate carboxylase () NP arcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for gene is a useful and effective sequence for tagging 38 Brassicaceae species.

摘要

DNA条形码序列正在大量数据集中不断积累。条形码通常是一段长度超过1000个碱基对的序列,会产生计算负担。尽管DNA条形码最初被设想为简单的物种标签,但目前条形码序列的识别用途很少被强调。单核苷酸多态性(SNP)关联研究让我们想到,SNP可能是区分不同物种的理想特征选择目标。我们假设基于SNP的条形码在物种鉴别方面可能比全长DNA条形码序列更有效。为解决这个问题,我们使用决策树算法测试了一种核酮糖二磷酸羧化酶()NP条形码(RSB)策略。经过比对和修剪后,在38种十字花科植物物种的序列中发现了31个SNP。在构建决策树时,计算这些SNP以建立决策规则,将序列逐级分为两组。经过算法处理,鉴别38个物种需要37个节点和31个位点。最后,基于RSB方法,根据决策树选择的SNP模式,识别出由31个SNP条形码组成的序列标签,用于鉴别38种十字花科物种。总之,本研究提供了这样的理论依据,即用于基因的DNA条形码的SNP方面是标记38种十字花科物种的有用且有效的序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65b3/5846911/113877633ae4/10.1177_1176934318760856-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验