Computational Centre, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok, Poland.
Department of Human Biology, Institute of Biology, Faculty of Biological and Veterinary Sciences, Nicolaus Copernicus University, ul. Lwowska 1, 87-100 Toruń, Poland.
Int J Mol Sci. 2022 Jun 24;23(13):7057. doi: 10.3390/ijms23137057.
In the case of bladder cancer, carcinoma in situ (CIS) is known to have poor diagnosis. However, there are not enough studies that examine the biomarkers relevant to CIS development. Omics experiments generate data with tens of thousands of descriptive variables, e.g., gene expression levels. Often, many of these descriptive variables are identified as somehow relevant, resulting in hundreds or thousands of relevant variables for building models or for further data analysis. We analyze one such dataset describing patients with bladder cancer, mostly non-muscle-invasive (NMIBC), and propose a novel approach to feature selection. This approach returns high-quality features for prediction and yet allows interpretability as well as a certain level of insight into the analyzed data. As a result, we obtain a small set of seven of the most-useful biomarkers for diagnostics. They can also be used to build tests that avoid the costly and time-consuming existing methods. We summarize the current biological knowledge of the chosen biomarkers and contrast it with our findings.
在膀胱癌的情况下,原位癌 (CIS) 的诊断情况不佳。然而,目前还没有足够的研究来检查与 CIS 发展相关的生物标志物。组学实验会生成数以万计的描述性变量的数据,例如基因表达水平。通常情况下,其中许多描述性变量被认为与某种程度上相关,从而产生数百或数千个用于构建模型或进一步数据分析的相关变量。我们分析了一个描述膀胱癌患者的数据集,这些患者大多是非肌肉浸润性膀胱癌 (NMIBC),并提出了一种新的特征选择方法。该方法返回了用于预测的高质量特征,同时还允许进行解释,以及对分析数据进行一定程度的洞察。结果,我们获得了用于诊断的七个最有用的生物标志物。它们还可以用于构建测试,从而避免现有的昂贵且耗时的方法。我们总结了所选生物标志物的当前生物学知识,并将其与我们的发现进行对比。