Suppr超能文献

利用血小板的RNA测序数据识别和分析不同的癌症亚型。

Identifying and analyzing different cancer subtypes using RNA-seq data of blood platelets.

作者信息

Zhang Yu-Hang, Huang Tao, Chen Lei, Xu YaoChen, Hu Yu, Hu Lan-Dian, Cai Yudong, Kong Xiangyin

机构信息

Department of General Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, People's Republic of China.

Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.

出版信息

Oncotarget. 2017 Sep 15;8(50):87494-87511. doi: 10.18632/oncotarget.20903. eCollection 2017 Oct 20.

Abstract

Detection and diagnosis of cancer are especially important for early prevention and effective treatments. Traditional methods of cancer detection are usually time-consuming and expensive. Liquid biopsy, a newly proposed noninvasive detection approach, can promote the accuracy and decrease the cost of detection according to a personalized expression profile. However, few studies have been performed to analyze this type of data, which can promote more effective methods for detection of different cancer subtypes. In this study, we applied some reliable machine learning algorithms to analyze data retrieved from patients who had one of six cancer subtypes (breast cancer, colorectal cancer, glioblastoma, hepatobiliary cancer, lung cancer and pancreatic cancer) as well as healthy persons. Quantitative gene expression profiles were used to encode each sample. Then, they were analyzed by the maximum relevance minimum redundancy method. Two feature lists were obtained in which genes were ranked rigorously. The incremental feature selection method was applied to the mRMR feature list to extract the optimal feature subset, which can be used in the support vector machine algorithm to determine the best performance for the detection of cancer subtypes and healthy controls. The ten-fold cross-validation for the constructed optimal classification model yielded an overall accuracy of 0.751. On the other hand, we extracted the top eighteen features (genes), including TTN, RHOH, RPS20, TRBC2, in another feature list, the MaxRel feature list, and performed a detailed analysis of them. The results indicated that these genes could be important biomarkers for discriminating different cancer subtypes and healthy controls.

摘要

癌症的检测与诊断对于早期预防和有效治疗尤为重要。传统的癌症检测方法通常既耗时又昂贵。液体活检是一种新提出的非侵入性检测方法,根据个性化表达谱可提高检测准确性并降低检测成本。然而,针对这类数据进行分析的研究较少,而此类分析有助于开发出检测不同癌症亚型的更有效方法。在本研究中,我们应用了一些可靠的机器学习算法来分析从患有六种癌症亚型(乳腺癌、结直肠癌、胶质母细胞瘤、肝胆癌、肺癌和胰腺癌)之一的患者以及健康人那里获取的数据。定量基因表达谱用于对每个样本进行编码。然后,通过最大相关最小冗余方法对其进行分析。获得了两个特征列表,其中的基因经过了严格排序。将增量特征选择方法应用于mRMR特征列表以提取最优特征子集,该子集可用于支持向量机算法,以确定检测癌症亚型和健康对照的最佳性能。对构建的最优分类模型进行十折交叉验证,总体准确率为0.751。另一方面,我们在另一个特征列表即MaxRel特征列表中提取了排名前18的特征(基因),包括TTN、RHOH、RPS20、TRBC2,并对它们进行了详细分析。结果表明,这些基因可能是区分不同癌症亚型和健康对照的重要生物标志物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验