• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习方法的蛋白质特征预测原核转座酶。

Prediction of prokaryotic transposases from protein features with machine learning approaches.

机构信息

Department of Clinical Laboratory, Wenzhou People's Hospital, The Third Affiliated Hospital of Shanghai University, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, PR China.

Department of Clinical Laboratory, The Second Affiliated Hospital of Guizhou Medical University, Kaili, PR China.

出版信息

Microb Genom. 2021 Jul;7(7). doi: 10.1099/mgen.0.000611.

DOI:10.1099/mgen.0.000611
PMID:34309504
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8477400/
Abstract

Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future.

摘要

鉴定原核转座酶(Tnps)不仅可以深入了解抗生素耐药性和毒力的传播,还可以了解 DNA 转移的过程。本研究旨在开发一种使用机器学习(ML)方法预测细菌和古菌中 Tnps 的分类器。我们从训练数据集中提取了总共 2751 种蛋白质特征,其中包括 14852 个 Tnps 和 14852 个对照,使用联合互信息和最小绝对收缩和选择算子算法选择了 75 个作为预测特征的签名。通过聚合这些特征签名,开发了一个集成分类器,它集成了一系列基于 ML 的分类器,用于识别 Tnps。进一步的验证表明,该分类器的平均 AUC 为 0.955,表现良好,并且达到或超过了其他常用方法。基于这个集成分类器,我们建立了一个独立的命令行工具 TnpDiscovery,旨在最大限度地为生物信息学家和实验研究人员提供方便,以进行 Tnp 预测。本研究证明了 ML 方法在鉴定 Tnps 方面的有效性,为未来发现新的 Tnps 提供了便利。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/34fa4fbe7729/mgen-7-0611-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/f30f550a6a26/mgen-7-0611-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/438604e04f5a/mgen-7-0611-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/bfbd63f1315a/mgen-7-0611-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/34fa4fbe7729/mgen-7-0611-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/f30f550a6a26/mgen-7-0611-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/438604e04f5a/mgen-7-0611-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/bfbd63f1315a/mgen-7-0611-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2780/8477400/34fa4fbe7729/mgen-7-0611-g004.jpg

相似文献

1
Prediction of prokaryotic transposases from protein features with machine learning approaches.基于机器学习方法的蛋白质特征预测原核转座酶。
Microb Genom. 2021 Jul;7(7). doi: 10.1099/mgen.0.000611.
2
Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods.利用从头预测和 Profile-HMM 方法的组合提高原核转座元件的识别。
BMC Genomics. 2013 Oct 11;14:700. doi: 10.1186/1471-2164-14-700.
3
Conserved amino acid motifs from the novel Piv/MooV family of transposases and site-specific recombinases are required for catalysis of DNA inversion by Piv.新型Piv/MooV转座酶和位点特异性重组酶家族的保守氨基酸基序是Piv催化DNA倒位所必需的。
Mol Microbiol. 2001 Feb;39(3):641-51. doi: 10.1046/j.1365-2958.2001.02276.x.
4
The diversity of prokaryotic DDE transposases of the mutator superfamily, insertion specificity, and association with conjugation machineries.诱变超家族原核DDE转座酶的多样性、插入特异性及其与接合机制的关联。
Genome Biol Evol. 2014 Feb;6(2):260-72. doi: 10.1093/gbe/evu010.
5
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
6
Presence of a characteristic D-D-E motif in IS1 transposase.插入序列1(IS1)转座酶中特征性D-D-E基序的存在。
J Bacteriol. 2002 Nov;184(22):6146-54. doi: 10.1128/JB.184.22.6146-6154.2002.
7
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
8
ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.ProkDBP:致力于更精确地识别原核 DNA 结合蛋白。
Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.
9
Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: a comparative study.使用机器学习算法进行特征选择技术优化胃癌患者五年生存率的预后因素:一项比较研究。
BMC Med Inform Decis Mak. 2023 Apr 6;23(1):54. doi: 10.1186/s12911-023-02154-y.
10
Characterization of the IS200/IS605 Insertion Sequence Family in .在 中鉴定 IS200/IS605 插入序列家族。
Genes (Basel). 2020 Apr 29;11(5):484. doi: 10.3390/genes11050484.

引用本文的文献

1
Single-cell transcriptomics across 2,534 microbial species reveals functional heterogeneity in the rumen microbiome.单细胞转录组学研究跨越 2534 种微生物物种,揭示了瘤胃微生物组的功能异质性。
Nat Microbiol. 2024 Jul;9(7):1884-1898. doi: 10.1038/s41564-024-01723-9. Epub 2024 Jun 12.

本文引用的文献

1
Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches.利用机器学习方法从单核苷酸多态性预测吸烟行为
Front Psychiatry. 2020 May 14;11:416. doi: 10.3389/fpsyt.2020.00416. eCollection 2020.
2
Predicting ATP-Binding Cassette Transporters Using the Random Forest Method.使用随机森林方法预测ATP结合盒转运蛋白
Front Genet. 2020 Mar 25;11:156. doi: 10.3389/fgene.2020.00156. eCollection 2020.
3
Prognostic Potential of Alternative Splicing Markers in Endometrial Cancer.子宫内膜癌中可变剪接标志物的预后潜力
Mol Ther Nucleic Acids. 2019 Dec 6;18:1039-1048. doi: 10.1016/j.omtn.2019.10.027. Epub 2019 Nov 2.
4
ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides.ACPred-Fuse:融合多视图信息可改善抗癌肽的预测。
Brief Bioinform. 2020 Sep 25;21(5):1846-1855. doi: 10.1093/bib/bbz088.
5
Feature selection may improve deep neural networks for the bioinformatics problems.特征选择可以改进用于生物信息学问题的深度神经网络。
Bioinformatics. 2020 Mar 1;36(5):1542-1552. doi: 10.1093/bioinformatics/btz763.
6
ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation.ACP-DL:一种使用高效特征表示来预测抗癌肽的深度学习长短期记忆模型。
Mol Ther Nucleic Acids. 2019 Sep 6;17:1-9. doi: 10.1016/j.omtn.2019.04.025. Epub 2019 May 10.
7
Predicting Ion Channels Genes and Their Types With Machine Learning Techniques.运用机器学习技术预测离子通道基因及其类型。
Front Genet. 2019 May 3;10:399. doi: 10.3389/fgene.2019.00399. eCollection 2019.
8
Transposons: the agents of antibiotic resistance in bacteria.转座子:细菌中抗生素耐药性的传播者。
J Basic Microbiol. 2018 Nov;58(11):905-917. doi: 10.1002/jobm.201800204. Epub 2018 Aug 16.
9
Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors.Bastion6:一种用于准确预测 VI 型分泌效应器的生物信息学方法。
Bioinformatics. 2018 Aug 1;34(15):2546-2555. doi: 10.1093/bioinformatics/bty155.
10
iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences.iFeature:一个用于从蛋白质和肽序列中提取和选择特征的 Python 包和网络服务器。
Bioinformatics. 2018 Jul 15;34(14):2499-2502. doi: 10.1093/bioinformatics/bty140.