NUS-ISS, National University of Singapore, 119615, Singapore.
Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
Comput Biol Med. 2024 Jan;168:107662. doi: 10.1016/j.compbiomed.2023.107662. Epub 2023 Nov 3.
This study introduces VF-Pred, a novel framework developed for the purpose of detecting virulence factors (VFs) through the analysis of genomic data. VFs are crucial for pathogens to successfully infect host tissue and evade the immune system, leading to the onset of infectious diseases. Identifying VFs accurately is of utmost importance in the quest for developing potent drugs and vaccines to counter these diseases. To accomplish this, VF-Pred combines various feature engineering techniques to generate inputs for distinct machine learning classification models. The collective predictions of these models are then consolidated by a final downstream model using an innovative ensembling approach. One notable aspect of VF-Pred is the inclusion of a novel Seq-Alignment feature, which significantly enhances the accuracy of the employed machine learning algorithms. The framework was meticulously trained on 982 features obtained from extensive feature engineering, utilizing a comprehensive ensemble of 25 models. The new downstream ensembling technique adopted by VF-Pred surpasses existing stacking strategies and other ensembling methods, delivering superior performance in VF detection. There have been similar studies done earlier, VF-Pred stands out in comparison showing higher accuracy (83.5 %), higher sensitivity (87 %) towards identification of VFs. Accessible through a user-friendly web page, VF-Pred can be accessed by providing the identifier and protein sequence, enabling the prediction of high or low likelihoods of VFs. Overall, VF-Pred showcases a highly promising methodology for the identification of VFs, potentially paving the way for the development of more effective strategies in the battle against infectious diseases.
本研究介绍了 VF-Pred,这是一个专门用于通过分析基因组数据来检测毒力因子 (VF) 的新型框架。VF 对于病原体成功感染宿主组织并逃避免疫系统至关重要,导致传染病的发生。准确识别 VF 对于开发针对这些疾病的有效药物和疫苗至关重要。为了实现这一目标,VF-Pred 结合了各种特征工程技术,为不同的机器学习分类模型生成输入。然后,通过使用创新的集成方法,最终下游模型对这些模型的集体预测进行整合。VF-Pred 的一个显著特点是包含了一种新颖的 Seq-Alignment 特征,这大大提高了所使用的机器学习算法的准确性。该框架经过精心训练,使用了 25 个综合模型的综合集成,在从广泛的特征工程中获得的 982 个特征上进行训练。VF-Pred 采用的新下游集成技术超越了现有的堆叠策略和其他集成方法,在 VF 检测方面表现出了卓越的性能。之前已经有类似的研究,相比之下,VF-Pred 表现出色,在识别 VF 方面的准确性更高 (83.5%),敏感性更高 (87%)。通过用户友好的网页访问 VF-Pred,用户可以通过提供标识符和蛋白质序列来访问,从而预测 VF 的可能性是高还是低。总的来说,VF-Pred 展示了一种非常有前途的 VF 识别方法,可能为对抗传染病的斗争开辟更有效的策略。