• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于决策树及其集成算法的基因组岛分类。

Classification of genomic islands using decision trees and their ensemble algorithms.

机构信息

Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA.

出版信息

BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-11-S2-S1.

DOI:10.1186/1471-2164-11-S2-S1
PMID:21047376
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2975412/
Abstract

BACKGROUND

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.

RESULTS

In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.

CONCLUSIONS

We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.

摘要

背景

基因组岛(GI)是某些细菌基因组中外来基因的聚类,但在同一属内的其他菌株的基因组中看不到。GI 的检测对医疗和环境界极为重要。尽管发现了与 GI 相关的特征,但 GI 的准确检测仍远未令人满意。

结果

在本文中,我们结合了多个与 GI 相关的特征,并应用和比较了各种机器学习方法来评估 GI 数据集在三个属(沙门氏菌、葡萄球菌、链球菌)及其三个属的混合数据集上的分类准确性。实验结果表明,总体而言,决策树方法在五种性能评估指标中优于其他机器学习方法。我们使用 J48 决策树作为基础分类器,进一步将四个集成算法(包括 adaBoost、bagging、multiboost 和随机森林)应用于相同的数据集。我们发现,总体而言,这些集成分类器可以提高分类准确性。

结论

我们得出结论,基于决策树的集成算法可以准确地对 GI 和非 GI 进行分类,并建议在未来的 GI 数据分析中使用这些方法。用于检测 GI 的软件包可在 http://www.esu.edu/cpsc/che_lab/software/GIDetector/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/436a6f340003/1471-2164-11-S2-S1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/8c1851e74464/1471-2164-11-S2-S1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/3911875c7e7c/1471-2164-11-S2-S1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/436a6f340003/1471-2164-11-S2-S1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/8c1851e74464/1471-2164-11-S2-S1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/3911875c7e7c/1471-2164-11-S2-S1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7094/2975412/436a6f340003/1471-2164-11-S2-S1-3.jpg

相似文献

1
Classification of genomic islands using decision trees and their ensemble algorithms.基于决策树及其集成算法的基因组岛分类。
BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-11-S2-S1.
2
A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data.一种结合AdaBoost和遗传算法的混合集成算法用于基于基因表达数据的癌症分类
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):863-870. doi: 10.1109/TCBB.2019.2952102. Epub 2021 Jun 3.
3
Machine Learning Based Identification of Microseismic Signals Using Characteristic Parameters.基于特征参数的微震信号机器学习识别。
Sensors (Basel). 2021 Oct 20;21(21):6967. doi: 10.3390/s21216967.
4
Resolving the structural features of genomic islands: a machine learning approach.解析基因组岛的结构特征:一种机器学习方法。
Genome Res. 2008 Feb;18(2):331-42. doi: 10.1101/gr.7004508. Epub 2007 Dec 10.
5
Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.利用 5000 多个数据集进行药物发现的多种机器学习算法的生物活性比较。
Mol Pharm. 2021 Jan 4;18(1):403-415. doi: 10.1021/acs.molpharmaceut.0c01013. Epub 2020 Dec 16.
6
Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.多目标进化算法在生存预测中的模糊分类。
Artif Intell Med. 2014 Mar;60(3):197-219. doi: 10.1016/j.artmed.2013.12.006. Epub 2014 Jan 9.
7
Decision tree and ensemble learning algorithms with their applications in bioinformatics.决策树和集成学习算法及其在生物信息学中的应用。
Adv Exp Med Biol. 2011;696:191-9. doi: 10.1007/978-1-4419-7046-6_19.
8
SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm.SSG-LUGIA:基于单序列的无监督基因组水平基因岛预测算法。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab116.
9
Evaluation of genomic island predictors using a comparative genomics approach.使用比较基因组学方法评估基因组岛预测器。
BMC Bioinformatics. 2008 Aug 5;9:329. doi: 10.1186/1471-2105-9-329.
10
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.基于一般伪氨基酸组成(PseAAC)的各种模式,使用各种决策树分类器对不平衡数据集预测膜蛋白类型。
J Theor Biol. 2017 Dec 21;435:208-217. doi: 10.1016/j.jtbi.2017.09.018. Epub 2017 Sep 20.

引用本文的文献

1
Experimental approaches to tracking mobile genetic elements in microbial communities.追踪微生物群落中移动遗传元件的实验方法。
FEMS Microbiol Rev. 2020 Sep 1;44(5):606-630. doi: 10.1093/femsre/fuaa025.
2
Metabolic Syndrome Prediction Using Machine Learning Models with Genetic and Clinical Information from a Nonobese Healthy Population.使用具有来自非肥胖健康人群的遗传和临床信息的机器学习模型预测代谢综合征
Genomics Inform. 2018 Dec;16(4):e31. doi: 10.5808/GI.2018.16.4.e31. Epub 2018 Dec 28.
3
Microbial genomic island discovery, visualization and analysis.

本文引用的文献

1
Evaluation of genomic island predictors using a comparative genomics approach.使用比较基因组学方法评估基因组岛预测器。
BMC Bioinformatics. 2008 Aug 5;9:329. doi: 10.1186/1471-2105-9-329.
2
Genome-based identification and molecular analyses of pathogenicity islands and genomic islands in Salmonella enterica.基于基因组的肠炎沙门氏菌致病岛和基因组岛的鉴定及分子分析
Methods Mol Biol. 2007;394:77-88. doi: 10.1007/978-1-59745-512-1_5.
3
Resolving the structural features of genomic islands: a machine learning approach.解析基因组岛的结构特征:一种机器学习方法。
微生物基因组岛的发现、可视化和分析。
Brief Bioinform. 2019 Sep 27;20(5):1685-1698. doi: 10.1093/bib/bby042.
4
A Computational Framework for Tracing the Origins of Genomic Islands in Prokaryotes.一种用于追踪原核生物基因组岛起源的计算框架。
Int Sch Res Notices. 2014 Oct 28;2014:732857. doi: 10.1155/2014/732857. eCollection 2014.
5
Computational methods for predicting genomic islands in microbial genomes.预测微生物基因组中基因岛的计算方法。
Comput Struct Biotechnol J. 2016 May 7;14:200-6. doi: 10.1016/j.csbj.2016.05.001. eCollection 2016.
6
Identifying pathogenicity islands in bacterial pathogenomics using computational approaches.利用计算方法在细菌病原体基因组学中识别致病岛。
Pathogens. 2014 Jan 13;3(1):36-56. doi: 10.3390/pathogens3010036.
Genome Res. 2008 Feb;18(2):331-42. doi: 10.1101/gr.7004508. Epub 2007 Dec 10.
4
Genetic flux over time in the Salmonella lineage.沙门氏菌谱系随时间的基因通量。
Genome Biol. 2007;8(6):R100. doi: 10.1186/gb-2007-8-6-r100.
5
MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands.移动基因组发现器:用于在计算机上和通过实验发现细菌基因组岛的网络工具。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W97-W104. doi: 10.1093/nar/gkm380. Epub 2007 May 30.
6
Pathogenicity islands: a molecular toolbox for bacterial virulence.致病岛:细菌毒力的分子工具箱
Cell Microbiol. 2006 Nov;8(11):1707-19. doi: 10.1111/j.1462-5822.2006.00794.x. Epub 2006 Aug 24.
7
Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands.用于鉴定水平获得性DNA的内插可变顺序基序:重新审视沙门氏菌致病岛
Bioinformatics. 2006 Sep 15;22(18):2196-203. doi: 10.1093/bioinformatics/btl369. Epub 2006 Jul 12.
8
Pfam: clans, web tools and services.蛋白质家族数据库(Pfam):家族分类、网络工具及服务
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D247-51. doi: 10.1093/nar/gkj149.
9
Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops.细菌基因组镶嵌结构的系统测定:物种主干与菌株特异性环
BMC Bioinformatics. 2005 Jul 12;6:171. doi: 10.1186/1471-2105-6-171.
10
A new computational method for the detection of horizontal gene transfer events.一种用于检测水平基因转移事件的新计算方法。
Nucleic Acids Res. 2005 Feb 16;33(3):922-33. doi: 10.1093/nar/gki187. Print 2005.