• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于支持向量机的核苷酸序列分类。

Classification of nucleotide sequences using support vector machines.

机构信息

Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, University of Tokyo, 1-1-1 Yayoi Bunkyo-Ku, Tokyo, 113-8657, Japan.

出版信息

J Mol Evol. 2010 Oct;71(4):250-67. doi: 10.1007/s00239-010-9380-9. Epub 2010 Aug 26.

DOI:10.1007/s00239-010-9380-9
PMID:20740280
Abstract

Species identification is one of the most important issues in biological studies. Due to recent increases in the amount of genomic information available and the development of DNA sequencing technologies, the applicability of using DNA sequences to identify species (commonly referred to as "DNA barcoding") is being tested in many areas. Several methods have been suggested to identify species using DNA sequences, including similarity scores, analysis of phylogenetic and population genetic information, and detection of species-specific sequence patterns. Although these methods have demonstrated good performance under a range of circumstances, they also have limitations, as they are subject to loss of information, require intensive computation and are sensitive to model mis-specification, and can be difficult to evaluate in terms of the significance of identification. Here, we suggest a new DNA barcoding method in which support vector machine (SVM) procedures are adopted. Our new method is nonparametric and thus is expected to be robust for a wide range of evolutionary scenarios as well as multilocus analyses. Furthermore, we describe bootstrap procedures that can be used to test the significances of species identifications. We implemented a novel conversion technique for transforming sequence data to real-valued vectors, and therefore, bootstrap procedures can be easily combined with our SVM approach. In this study, we present the results of simulation studies and empirical data analyses to demonstrate the performance of our method and discuss its properties.

摘要

物种鉴定是生物学研究中最重要的问题之一。由于最近基因组信息量的增加和 DNA 测序技术的发展,利用 DNA 序列鉴定物种(通常称为“DNA 条形码”)的适用性正在许多领域得到检验。已经提出了几种使用 DNA 序列鉴定物种的方法,包括相似度评分、系统发生和种群遗传信息分析以及检测物种特异性序列模式。尽管这些方法在一系列情况下表现出良好的性能,但它们也存在局限性,因为它们会导致信息丢失,需要密集的计算,并且对模型的误设定敏感,并且在鉴定的显著性方面难以评估。在这里,我们建议采用支持向量机(SVM)程序的新 DNA 条形码方法。我们的新方法是非参数的,因此有望在广泛的进化情景以及多点分析中具有稳健性。此外,我们还描述了可用于测试物种鉴定显著性的自举程序。我们实施了一种新颖的转换技术,可将序列数据转换为实值向量,因此,自举程序可以很容易地与我们的 SVM 方法结合使用。在本研究中,我们呈现了模拟研究和实际数据分析的结果,以展示我们的方法的性能,并讨论其性质。

相似文献

1
Classification of nucleotide sequences using support vector machines.基于支持向量机的核苷酸序列分类。
J Mol Evol. 2010 Oct;71(4):250-67. doi: 10.1007/s00239-010-9380-9. Epub 2010 Aug 26.
2
A new efficient method for analyzing fungi species using correlations between nucleotides.一种利用核苷酸相关性分析真菌物种的新方法。
BMC Evol Biol. 2018 Dec 27;18(1):200. doi: 10.1186/s12862-018-1330-y.
3
A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.一种基于光谱表示和神经气体网络的基于k-mer的条形码DNA分类方法。
Artif Intell Med. 2015 Jul;64(3):173-84. doi: 10.1016/j.artmed.2015.06.002. Epub 2015 Jul 4.
4
DNA barcoding of morphologically characterized mosquitoes belonging to the subfamily Culicinae from Sri Lanka.斯里兰卡库蚊亚科形态特征明显的蚊子的 DNA 条码。
Parasit Vectors. 2018 Apr 25;11(1):266. doi: 10.1186/s13071-018-2810-z.
5
Coalescent-based DNA barcoding: multilocus analysis and robustness.基于溯祖理论的DNA条形码:多位点分析与稳健性
J Comput Biol. 2012 Mar;19(3):271-8. doi: 10.1089/cmb.2011.0122.
6
Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.基于深度神经网络的 DNA 序列分类研究:超越序列相似性的分类方法
Proc Natl Acad Sci U S A. 2022 Aug 30;119(35):e2122636119. doi: 10.1073/pnas.2122636119. Epub 2022 Aug 26.
7
Scalable classification of organisms into a taxonomy using hierarchical supervised learners.使用分层监督学习器将生物体可扩展地分类到分类法中。
J Bioinform Comput Biol. 2020 Oct;18(5):2050026. doi: 10.1142/S0219720020500262. Epub 2020 Oct 29.
8
DNA barcoding and phylogenetic analysis of Pectinidae (Mollusca: Bivalvia) based on mitochondrial COI and 16S rRNA genes.基于线粒体 COI 和 16S rRNA 基因的扇贝科(软体动物:双壳纲) DNA 条形码和系统发育分析。
Mol Biol Rep. 2011 Jan;38(1):291-9. doi: 10.1007/s11033-010-0107-1. Epub 2010 Mar 25.
9
SNP barcoding based on decision tree algorithm: A new tool for identification of mosquito species with special reference to Anopheles.基于决策树算法的 SNP 条码:一种用于鉴定蚊子种类的新工具,特别针对按蚊。
Acta Trop. 2019 Nov;199:105152. doi: 10.1016/j.actatropica.2019.105152. Epub 2019 Aug 22.
10
A two-stage evolutionary approach for effective classification of hypersensitive DNA sequences.一种用于超敏DNA序列有效分类的两阶段进化方法。
J Bioinform Comput Biol. 2011 Jun;9(3):399-413. doi: 10.1142/s0219720011005586.

引用本文的文献

1
DNA N-gram analysis framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences.DNA N元语法分析框架(DNAnamer):一种用于DNA序列监督分类的广义N元语法频率分析框架。
Heliyon. 2024 Aug 24;10(17):e36914. doi: 10.1016/j.heliyon.2024.e36914. eCollection 2024 Sep 15.
2
Clustering and classification of virus sequence through music communication protocol and wavelet transform.通过音乐通信协议和小波变换对病毒序列进行聚类和分类。
Genomics. 2021 Jan;113(1 Pt 2):778-784. doi: 10.1016/j.ygeno.2020.10.009. Epub 2020 Oct 16.
3
Can artificial neural replicators be useful for studying RNA replicators?

本文引用的文献

1
CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.系统发育树的置信区间:一种使用自展法的方法。
Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.
2
DNA BARCODING: CO1 DNA barcoding amphibians: take the chance, meet the challenge.DNA 条形码:CO1 DNA 条形码两栖动物:抓住机遇,迎接挑战。
Mol Ecol Resour. 2008 Mar;8(2):235-46. doi: 10.1111/j.1471-8286.2007.01964.x.
3
The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments.通过测量序列片段的特征从氨基酸序列准确预测蛋白质家族。
人工神经网络复制子对研究 RNA 复制子有用吗?
Arch Virol. 2020 Nov;165(11):2513-2529. doi: 10.1007/s00705-020-04779-0. Epub 2020 Aug 19.
4
Automated high throughput animal CO1 metabarcode classification.自动化高通量动物 CO1 代谢条码分类。
Sci Rep. 2018 Mar 9;8(1):4226. doi: 10.1038/s41598-018-22505-4.
5
-QR classifier: a patterns based approach for plant species identification.-QR分类器:一种基于模式的植物物种识别方法。
BioData Min. 2016 Dec 9;9:39. doi: 10.1186/s13040-016-0120-6. eCollection 2016.
6
Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods.运用最大相关最小冗余法和迭代特征选择法预测与分析蛋白质糖基化位点
Biomed Res Int. 2015;2015:561547. doi: 10.1155/2015/561547. Epub 2015 Apr 15.
7
Biodefense Oriented Genomic-Based Pathogen Classification Systems: Challenges and Opportunities.面向生物防御的基于基因组的病原体分类系统:挑战与机遇
J Bioterror Biodef. 2012 Mar 16;3(1):1000113. doi: 10.4172/2157-2526.1000113.
8
Identification of cichlid fishes from Lake Malawi using computer vision.利用计算机视觉鉴定马拉维湖慈鲷鱼。
PLoS One. 2013 Oct 25;8(10):e77686. doi: 10.1371/journal.pone.0077686. eCollection 2013.
9
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.一种通过机器学习与生物信息学方法相结合,利用蛋白质编码和非编码 DNA 条码进行物种鉴定的新方法。
PLoS One. 2012;7(2):e30986. doi: 10.1371/journal.pone.0030986. Epub 2012 Feb 20.
10
DNA barcoding of recently diverged species: relative performance of matching methods.近期分化物种的 DNA 条码:匹配方法的相对性能。
PLoS One. 2012;7(1):e30490. doi: 10.1371/journal.pone.0030490. Epub 2012 Jan 17.
J Comput Biol. 2009 Dec;16(12):1671-88. doi: 10.1089/cmb.2008.0115.
4
Efficient alignment-free DNA barcode analytics.高效的无比对 DNA 条码分析。
BMC Bioinformatics. 2009 Nov 10;10 Suppl 14(Suppl 14):S9. doi: 10.1186/1471-2105-10-S14-S9.
5
Rapid DNA barcoding analysis of large datasets using the composition vector method.利用组成向量方法对大型数据集进行快速 DNA 条形码分析。
BMC Bioinformatics. 2009 Nov 10;10 Suppl 14(Suppl 14):S8. doi: 10.1186/1471-2105-10-S14-S8.
6
Learning to classify species with barcodes.学习用条码对物种进行分类。
BMC Bioinformatics. 2009 Nov 10;10 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-10-S14-S7.
7
Statistical assignment of DNA sequences using Bayesian phylogenetics.使用贝叶斯系统发育学对DNA序列进行统计分配。
Syst Biol. 2008 Oct;57(5):750-7. doi: 10.1080/10635150802422316.
8
Phylogenetic inference using whole genomes.使用全基因组进行系统发育推断。
Annu Rev Genomics Hum Genet. 2008;9:217-31. doi: 10.1146/annurev.genom.9.081307.164407.
9
Estimating species trees using multiple-allele DNA sequence data.利用多等位基因DNA序列数据估计物种树。
Evolution. 2008 Aug;62(8):2080-91. doi: 10.1111/j.1558-5646.2008.00414.x. Epub 2008 May 5.
10
Rapid, one-step DNA extraction for insect pest identification by using DNA barcodes.利用DNA条形码进行害虫鉴定的快速一步法DNA提取
J Econ Entomol. 2008 Apr;101(2):523-32. doi: 10.1603/0022-0493(2008)101[523:rodefi]2.0.co;2.