Department of Clinical Laboratory, Wenzhou People's Hospital, The Third Affiliated Hospital of Shanghai University, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, PR China.
Department of Clinical Laboratory, The Second Affiliated Hospital of Guizhou Medical University, Kaili, PR China.
Microb Genom. 2021 Jul;7(7). doi: 10.1099/mgen.0.000611.
Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future.
鉴定原核转座酶(Tnps)不仅可以深入了解抗生素耐药性和毒力的传播,还可以了解 DNA 转移的过程。本研究旨在开发一种使用机器学习(ML)方法预测细菌和古菌中 Tnps 的分类器。我们从训练数据集中提取了总共 2751 种蛋白质特征,其中包括 14852 个 Tnps 和 14852 个对照,使用联合互信息和最小绝对收缩和选择算子算法选择了 75 个作为预测特征的签名。通过聚合这些特征签名,开发了一个集成分类器,它集成了一系列基于 ML 的分类器,用于识别 Tnps。进一步的验证表明,该分类器的平均 AUC 为 0.955,表现良好,并且达到或超过了其他常用方法。基于这个集成分类器,我们建立了一个独立的命令行工具 TnpDiscovery,旨在最大限度地为生物信息学家和实验研究人员提供方便,以进行 Tnp 预测。本研究证明了 ML 方法在鉴定 Tnps 方面的有效性,为未来发现新的 Tnps 提供了便利。