Qiu Wang-Ren, Qi Bei-Bei, Lin Wei-Zhong, Zhang Shou-Hua, Yu Wang-Ke, Huang Shun-Fa
Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China.
Department of General Surgery, Jiangxi Provincial Children's Hospital, Nanchang, China.
Front Genet. 2022 Jun 30;13:926927. doi: 10.3389/fgene.2022.926927. eCollection 2022.
The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma.
肺腺癌患者的早期症状不明显,肺腺癌的临床诊断主要通过X线检查和病理切片检查,而随着生物信息学技术的发展,生物标志物的发现为肺腺癌的诊断指出了另一个方向。然而,由于具有高维低样本量(HDLSS)特征的组学数据或仅利用单一组学数据产生的生物标志物来诊断肺腺癌并不准确且不可靠。为了解决上述问题,采用生物分析的特征选择方法来降低基因表达数据(GSE19188)和DNA甲基化数据(GSE139032、GSE49996)的维度。此外,使用笛卡尔积方法来扩充样本集并整合基因表达数据和DNA甲基化数据。利用深度神经网络进行分类,并在K折交叉验证上进行评估。此外,使用基因本体分析和文献检索来分析所选基因的生物学相关性,并通过Kaplan-Meier估计利用TCGA数据库对这些潜在基因进行生存分析,以发现肺腺癌的详细分子机制。生存分析表明,COL5A2和SERPINB5对识别肺腺癌具有重要意义,被认为是肺腺癌的生物标志物。