Canderan Jamie, Stamboulian Moses, Ye Yuzhen
Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408 USA.
Health Inf Sci Syst. 2025 Aug 29;13(1):54. doi: 10.1007/s13755-025-00369-z. eCollection 2025 Dec.
The gut microbiome plays a fundamental role in human health and disease. Individual variations in the microbiome and the corresponding functional implications are key considerations to enhance precision health and medicine. Metaproteomics has recently revealed protein expression that might be associated with human health and disease. Existing studies focused on either human proteins or bacterial proteins that can be identified from (meta)proteomics data sets, but not both. In this study, we examined the feasibility of identifying both human and bacterial proteins that are differentially expressed between healthy and diseased individuals from metaproteomics data sets. We further evaluated different strategies of using identified peptides and proteins for building predictive models. By leveraging existing metaproteomics data sets and a tool that we have developed for metaproteomics data analysis (MetaProD), we were able to derive both human and bacterial differentially expressed proteins that could serve as potential biomarkers for all diseases we studied. We also built predictive models using identified peptides and proteins as features for prediction of human diseases. Our results showed peptide-based identifications over protein-based ones often produce the most accurate models and that feature selection can offer improvements. Prediction accuracy could be further improved, in some cases, by including bacterial identifications, but missing data in bacterial identifications remains problematic.
肠道微生物群在人类健康和疾病中起着至关重要的作用。微生物群的个体差异及其相应的功能影响是提升精准健康和医学水平的关键考量因素。元蛋白质组学最近揭示了可能与人类健康和疾病相关的蛋白质表达情况。现有研究要么聚焦于可从(元)蛋白质组学数据集中识别出的人类蛋白质,要么关注细菌蛋白质,但并未同时兼顾两者。在本研究中,我们检验了从元蛋白质组学数据集中识别健康个体与患病个体之间差异表达的人类和细菌蛋白质的可行性。我们进一步评估了使用已识别的肽段和蛋白质构建预测模型的不同策略。通过利用现有的元蛋白质组学数据集以及我们开发的用于元蛋白质组学数据分析的工具(MetaProD),我们能够得出人类和细菌差异表达蛋白质,它们可作为我们所研究的所有疾病的潜在生物标志物。我们还使用已识别的肽段和蛋白质作为特征来构建预测人类疾病的模型。我们的结果表明,基于肽段的识别通常比基于蛋白质的识别产生更准确的模型,并且特征选择可以带来改进。在某些情况下,纳入细菌识别结果可进一步提高预测准确性,但细菌识别中的数据缺失问题仍然存在。