Suppr超能文献

使用机器学习方法从非编码RNA谱分类肝癌早晚期的长链非编码RNA

Classification of Long Non-Coding RNAs s Between Early and Late Stage of Liver Cancers From Non-coding RNA Profiles Using Machine-Learning Approach.

作者信息

Anuntakarun Songtham, Khamjerm Jakkrit, Tangkijvanich Pisit, Chuaypen Natthaya

机构信息

Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.

Biomedical Engineering Program, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand.

出版信息

Bioinform Biol Insights. 2024 Jun 5;18:11779322241258586. doi: 10.1177/11779322241258586. eCollection 2024.

Abstract

Long non-coding RNAs (lncRNAs), which are RNA sequences greater than 200 nucleotides in length, play a crucial role in regulating gene expression and biological processes associated with cancer development and progression. Liver cancer is a major cause of cancer-related mortality worldwide, notably in Thailand. Although machine learning has been extensively used in analyzing RNA-sequencing data for advanced knowledge, the identification of potential lncRNA biomarkers for cancer, particularly focusing on lncRNAs as molecular biomarkers in liver cancer, remains comparatively limited. In this study, our objective was to identify candidate lncRNAs in liver cancer. We employed an expression data set of lncRNAs from patients with liver cancer, which comprised 40 699 lncRNAs sourced from The CancerLivER database. Various feature selection methods and machine-learning approaches were used to identify these candidate lncRNAs. The results showed that the random forest algorithm could predict lncRNAs using features extracted from the database, which achieved an area under the curve (AUC) of 0.840 for classifying lncRNAs between early (stage 1) and late stages (stages 2, 3, and 4) of liver cancer. Five of 23 significant lncRNAs (WAC-AS1, MAPKAPK5-AS1, ARRDC1-AS1, AC133528.2, and RP11-1094M14.11) were differentially expressed between early and late stage of liver cancer. Based on the Gene Expression Profiling Interactive Analysis (GEPIA) database, higher expression of WAC-AS1, MAPKAPK5-AS1, and ARRDC1-AS1 was associated with shorter overall survival. In conclusion, the classification model could predict the early and late stages of liver cancer using the signature expression of lncRNA genes. The identified lncRNAs might be used as early diagnostic and prognostic biomarkers for patients with liver cancer.

摘要

长链非编码RNA(lncRNAs)是长度大于200个核苷酸的RNA序列,在调节与癌症发生和发展相关的基因表达和生物学过程中起着关键作用。肝癌是全球癌症相关死亡的主要原因,在泰国尤为明显。尽管机器学习已被广泛用于分析RNA测序数据以获取深入知识,但用于癌症的潜在lncRNA生物标志物的识别,特别是将lncRNAs作为肝癌的分子生物标志物,仍然相对有限。在本研究中,我们的目标是识别肝癌中的候选lncRNAs。我们使用了来自肝癌患者的lncRNAs表达数据集,该数据集包含40699个源自The CancerLivER数据库的lncRNAs。采用了各种特征选择方法和机器学习方法来识别这些候选lncRNAs。结果表明,随机森林算法可以使用从数据库中提取的特征来预测lncRNAs,在区分肝癌早期(1期)和晚期(2、3和4期)的lncRNAs时,曲线下面积(AUC)达到0.840。23个显著的lncRNAs(WAC-AS1、MAPKAPK5-AS1、ARRDC1-AS1、AC133528.2和RP11-1094M14.11)中有5个在肝癌早期和晚期之间差异表达。基于基因表达谱交互式分析(GEPIA)数据库,WAC-AS1、MAPKAPK5-AS1和ARRDC1-AS1的高表达与较短的总生存期相关。总之,分类模型可以使用lncRNA基因的特征表达来预测肝癌的早期和晚期。所识别的lncRNAs可能用作肝癌患者的早期诊断和预后生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a94/11155358/c95905a8b483/10.1177_11779322241258586-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验