Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
University of Kansas Cancer Center, Kansas City, USA.
Mol Omics. 2024 Jun 10;20(5):348-358. doi: 10.1039/d4mo00008k.
Omics data sets often pose a computational challenge due to their high dimensionality, large size, and non-linear structures. Analyzing these data sets becomes especially daunting in the presence of rare events. Machine learning (ML) methods have gained traction for analyzing rare events, yet there has been limited exploration of bioinformatics tools that integrate ML techniques to comprehend the underlying biology. Expanding upon our previously developed computational framework of an integrative machine learning approach, we introduce PerSEveML, an interactive web-based tool that uses crowd-sourced intelligence to predict rare events and determine feature selection structures. PerSEveML provides a comprehensive overview of the integrative approach through evaluation metrics that help users understand the contribution of individual ML methods to the prediction process. Additionally, PerSEveML calculates entropy and rank scores, which visually organize input features into a persistent structure of selected, unselected, and fluctuating categories that help researchers uncover meaningful hypotheses regarding the underlying biology. We have evaluated PerSEveML on three diverse biologically complex data sets with extremely rare events from small to large scale and have demonstrated its ability to generate valid hypotheses. PerSEveML is available at https://biostats-shinyr.kumc.edu/PerSEveML/ and https://github.com/sreejatadutta/PerSEveML.
组学数据集由于其高维性、大数据量和非线性结构,常常带来计算上的挑战。在稀有事件存在的情况下,分析这些数据集尤其具有挑战性。机器学习 (ML) 方法已被广泛应用于分析稀有事件,但对于集成 ML 技术以理解潜在生物学的生物信息学工具的探索还很有限。在我们之前开发的综合机器学习方法的计算框架的基础上,我们引入了 PerSEveML,这是一个基于网络的交互式工具,利用众包智能来预测稀有事件并确定特征选择结构。PerSEveML 通过评估指标提供了综合方法的全面概述,帮助用户了解单个 ML 方法对预测过程的贡献。此外,PerSEveML 计算熵和排名分数,将输入特征以选择、未选择和波动类别的持久结构进行可视化组织,帮助研究人员发现有关潜在生物学的有意义的假设。我们已经在三个具有小到大规模的极其稀有事件的不同生物复杂性数据集上评估了 PerSEveML,并展示了其生成有效假设的能力。PerSEveML 可在 https://biostats-shinyr.kumc.edu/PerSEveML/ 和 https://github.com/sreejatadutta/PerSEveML 上获得。