Suppr超能文献

基于转录组测序数据和卷积神经网络算法的乳腺癌生物标志物分类的人工图像目标。

Artificial image objects for classification of breast cancer biomarkers with transcriptome sequencing data and convolutional neural network algorithms.

机构信息

410 AI, LLC, Germantown, MD, 20876, USA.

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.

出版信息

Breast Cancer Res. 2021 Oct 10;23(1):96. doi: 10.1186/s13058-021-01474-z.

Abstract

BACKGROUND

Transcriptome sequencing has been broadly available in clinical studies. However, it remains a challenge to utilize these data effectively for clinical applications due to the high dimension of the data and the highly correlated expression between individual genes.

METHODS

We proposed a method to transform RNA sequencing data into artificial image objects (AIOs) and applied convolutional neural network (CNN) algorithms to classify these AIOs. With the AIO technique, we considered each gene as a pixel in an image and its expression level as pixel intensity. Using the GSE96058 (n = 2976), GSE81538 (n = 405), and GSE163882 (n = 222) datasets, we created AIOs for the subjects and designed CNN models to classify biomarker Ki67 and Nottingham histologic grade (NHG).

RESULTS

With fivefold cross-validation, we accomplished a classification accuracy and AUC of 0.821 ± 0.023 and 0.891 ± 0.021 for Ki67 status. For NHG, the weighted average of categorical accuracy was 0.820 ± 0.012, and the weighted average of AUC was 0.931 ± 0.006. With GSE96058 as training data and GSE81538 as testing data, the accuracy and AUC for Ki67 were 0.826 ± 0.037 and 0.883 ± 0.016, and that for NHG were 0.764 ± 0.052 and 0.882 ± 0.012, respectively. These results were 10% better than the results reported in the original studies. For Ki67, the calls generated from our models had a better power for prediction of survival as compared to the calls from trained pathologists in survival analyses.

CONCLUSIONS

We demonstrated that RNA sequencing data could be transformed into AIOs and be used to classify Ki67 status and NHG with CNN algorithms. The AIO method could handle high-dimensional data with highly correlated variables, and there was no need for variable selection. With the AIO technique, a data-driven, consistent, and automation-ready model could be developed to classify biomarkers with RNA sequencing data and provide more efficient care for cancer patients.

摘要

背景

转录组测序在临床研究中已经得到广泛应用。然而,由于数据的高维性和个体基因之间高度相关的表达,有效地利用这些数据进行临床应用仍然是一个挑战。

方法

我们提出了一种将 RNA 测序数据转换为人工图像对象 (AIO) 的方法,并应用卷积神经网络 (CNN) 算法对这些 AIO 进行分类。使用 AIO 技术,我们将每个基因视为图像中的一个像素,其表达水平为像素强度。使用 GSE96058(n=2976)、GSE81538(n=405)和 GSE163882(n=222)数据集,我们为受试者创建了 AIO,并设计了 CNN 模型来分类生物标志物 Ki67 和诺丁汉组织学分级 (NHG)。

结果

通过五重交叉验证,我们实现了 Ki67 状态分类的准确性和 AUC 分别为 0.821±0.023 和 0.891±0.021。对于 NHG,分类准确性的加权平均值为 0.820±0.012,AUC 的加权平均值为 0.931±0.006。使用 GSE96058 作为训练数据和 GSE81538 作为测试数据,Ki67 的准确性和 AUC 分别为 0.826±0.037 和 0.883±0.016,NHG 的准确性和 AUC 分别为 0.764±0.052 和 0.882±0.012。这些结果比原始研究报告的结果要好 10%。对于 Ki67,与经过训练的病理学家在生存分析中生成的分类相比,我们的模型生成的分类对生存的预测能力更强。

结论

我们证明了 RNA 测序数据可以转换为 AIO,并使用 CNN 算法对 Ki67 状态和 NHG 进行分类。AIO 方法可以处理具有高度相关变量的高维数据,并且不需要进行变量选择。使用 AIO 技术,可以开发一种数据驱动、一致且可自动化的模型,对 RNA 测序数据进行分类,并为癌症患者提供更有效的护理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b571/8504079/c1d3585f3662/13058_2021_1474_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验