School of Mathematics and Statistics, Yunnan University, Kunming, Yunnan, China.
PeerJ. 2023 Jan 17;11:e14667. doi: 10.7717/peerj.14667. eCollection 2023.
One of the most common diseases among women of reproductive age is bacterial vaginosis (BV). However, the etiology of BV remains unknown. In this study, we modeled the temporal sample of the vaginal microbiome as a network and investigated the relationship between the network edges and BV. Furthermore, we used feature selection algorithms including decision tree (DT) and ReliefF (RF) to select the network feature edges associated with BV and subsequently validated these feature edges through logistic regression (LR) and support vector machine (SVM). The results show that: machine learning can distinguish vaginal community states (BV, ABV, SBV, and HEA) based on a few feature edges; selecting the top five feature edges of importance can achieve the best accuracy for the feature selection and classification model; the feature edges selected by DT outperform those selected by RF in terms of classification algorithm LR and SVM, and LR with DT feature edges is more suitable for diagnosing BV; two feature selection algorithms exhibit differences in the importance of ranking of edges; the feature edges selected by DT and RF cannot construct sub-network associated with BV. In short, the feature edges selected by our method can serve as indicators for personalized diagnosis of BV and aid in the clarification of a more mechanistic interpretation of its etiology.
细菌性阴道病(BV)是育龄期妇女最常见的疾病之一。然而,BV 的病因仍然未知。在这项研究中,我们将阴道微生物组的时间样本建模为网络,并研究了网络边缘与 BV 之间的关系。此外,我们使用包括决策树(DT)和 ReliefF(RF)在内的特征选择算法,选择与 BV 相关的网络特征边缘,并通过逻辑回归(LR)和支持向量机(SVM)对这些特征边缘进行验证。结果表明:机器学习可以根据少数特征边缘区分阴道群落状态(BV、ABV、SBV 和 HEA);选择最重要的前五个特征边缘可以为特征选择和分类模型实现最佳精度;DT 选择的特征边缘在分类算法 LR 和 SVM 方面优于 RF 选择的特征边缘,并且具有 DT 特征边缘的 LR 更适合诊断 BV;两种特征选择算法在边缘重要性排序方面存在差异;DT 和 RF 选择的特征边缘无法构建与 BV 相关的子网。总之,我们方法选择的特征边缘可以作为个性化诊断 BV 的指标,并有助于阐明其病因的更机械解释。