Gerolami Justin, Wong Justin Jong Mun, Zhang Ricky, Chen Tong, Imtiaz Tashifa, Smith Miranda, Jamaspishvili Tamara, Koti Madhuri, Glasgow Janice Irene, Mousavi Parvin, Renwick Neil, Tyryshkin Kathrin
School of Computing, Queen's University, Kingston, ON K7L 3N6, Canada.
Department of Pathology and Molecular Medicine, Queen's University, Kingston, ON K7L 3N6, Canada.
Diagnostics (Basel). 2022 Aug 18;12(8):1997. doi: 10.3390/diagnostics12081997.
Complex high-dimensional datasets that are challenging to analyze are frequently produced through '-omics' profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effective alternative approaches are urgently needed to identify features-of-interest in '-omics' data. In this study, we present the molecular feature selection tool, a novel, ensemble-based, feature selection application for identifying candidate biomarkers in '-omics' data. As proof-of-principle, we applied the molecular feature selection tool to identify a small set of immune-related genes as potential biomarkers of three prostate adenocarcinoma subtypes. Furthermore, we tested the selected genes in a model to classify the three subtypes and compared the results to models built using all genes and all differentially expressed genes. Genes identified with the molecular feature selection tool performed better than the other models in this study in all comparison metrics: accuracy, precision, recall, and F1-score using a significantly smaller set of genes. In addition, we developed a simple graphical user interface for the molecular feature selection tool, which is available for free download. This user-friendly interface is a valuable tool for the identification of potential biomarkers in gene expression datasets and is an asset for biomarker discovery studies.
“组学”分析常常会产生复杂的高维数据集,这些数据集分析起来颇具挑战性。通常,这些数据集包含的基因组特征比样本更多,这限制了基于多变量统计和机器学习的分析方法的应用。因此,迫切需要有效的替代方法来识别“组学”数据中的感兴趣特征。在本研究中,我们展示了分子特征选择工具,这是一种新颖的、基于集成的特征选择应用程序,用于识别“组学”数据中的候选生物标志物。作为原理验证,我们应用分子特征选择工具来识别一小部分免疫相关基因,作为三种前列腺腺癌亚型的潜在生物标志物。此外,我们在一个模型中测试了所选基因以对这三种亚型进行分类,并将结果与使用所有基因和所有差异表达基因构建的模型进行比较。在本研究中,使用分子特征选择工具识别出的基因在所有比较指标(准确性、精确性、召回率和F1分数)上的表现均优于其他模型,且使用的基因集明显更小。此外,我们为分子特征选择工具开发了一个简单的图形用户界面,该界面可供免费下载。这个用户友好的界面是在基因表达数据集中识别潜在生物标志物的宝贵工具,也是生物标志物发现研究的一项资产。