Zhao Enyang, Li Xuedong, You Bosen, Wang Jinpeng, Hou Wenbin, Wu Qiong
School of Life Science and Technology, Harbin Institute of Technology, Harbin, China.
The Second Affiliated Hospital of Harbin Medical University, Harbin Medical University, Harbin, China.
Front Genet. 2022 Apr 20;13:857411. doi: 10.3389/fgene.2022.857411. eCollection 2022.
Kidney renal clear cell carcinoma, which is a common type and accounts for 70-80% of renal cell carcinoma, can easily lead to metastasis and even death. A reliable signature for diagnosis of this cancer is in need. Hence, we seek to select miRNAs for identifying kidney renal clear cell carcinoma. A feature selection strategy is used and improved to identify microRNAs for diagnosis of kidney renal clear cell carcinoma. Samples representing kidney renal clear cell carcinoma and normal tissues are split into training and testing groups. Accumulated scores representing the variable importance of each miRNA are derived from an iteration of resampling, training, and scoring. Those miRNAs with higher scores are selected based on the Gaussian mixture model. The sample split is repeated ten times to get more central miRNAs. A total of 611 samples are downloaded from TCGA, each of which contains 1,343 miRNAs. The improved feature selection method is implemented, and five miRNAs are identified as a biomarker for diagnosis of kidney renal clear cell carcinoma. GSE151419 and GSE151423 are selected as the independent testing sets. Experimental results indicate the effectiveness of the selected signature. Both data-driven measurements and knowledge-driven evidence are given to show the effectiveness of our selection results.
肾透明细胞癌是一种常见类型,占肾细胞癌的70-80%,很容易导致转移甚至死亡。因此需要一种可靠的诊断该癌症的特征标志物。为此,我们试图筛选用于识别肾透明细胞癌的微小RNA(miRNA)。我们采用并改进了一种特征选择策略来识别用于诊断肾透明细胞癌的miRNA。将代表肾透明细胞癌和正常组织的样本分为训练组和测试组。通过重采样、训练和评分的迭代过程得出代表每个miRNA变量重要性的累积分数。基于高斯混合模型选择得分较高的那些miRNA。样本划分重复十次以获得更核心的miRNA。从癌症基因组图谱(TCGA)下载了总共611个样本,每个样本包含1343个miRNA。实施改进后的特征选择方法,鉴定出5个miRNA作为诊断肾透明细胞癌的生物标志物。选择GSE151419和GSE151423作为独立测试集。实验结果表明所选特征标志物的有效性。同时给出了数据驱动的测量结果和知识驱动的证据来证明我们选择结果的有效性。