Suppr超能文献

基于基因表达数据训练的有监督机器学习的浸润性导管癌进展分类模型。

Classification models for Invasive Ductal Carcinoma Progression, based on gene expression data-trained supervised machine learning.

机构信息

International Centre for Genetic Engineering and Biotechnology, New Delhi, India.

出版信息

Sci Rep. 2020 Mar 5;10(1):4113. doi: 10.1038/s41598-020-60740-w.

Abstract

Early detection of breast cancer and its correct stage determination are important for prognosis and rendering appropriate personalized clinical treatment to breast cancer patients. However, despite considerable efforts and progress, there is a need to identify the specific genomic factors responsible for, or accompanying Invasive Ductal Carcinoma (IDC) progression stages, which can aid the determination of the correct cancer stages. We have developed two-class machine-learning classification models to differentiate the early and late stages of IDC. The prediction models are trained with RNA-seq gene expression profiles representing different IDC stages of 610 patients, obtained from The Cancer Genome Atlas (TCGA). Different supervised learning algorithms were trained and evaluated with an enriched model learning, facilitated by different feature selection methods. We also developed a machine-learning classifier trained on the same datasets with training sets reduced data corresponding to IDC driver genes. Based on these two classifiers, we have developed a web-server Duct-BRCA-CSP to predict early stage from late stages of IDC based on input RNA-seq gene expression profiles. The analysis conducted by us also enables deeper insights into the stage-dependent molecular events accompanying IDC progression. The server is publicly available at http://bioinfo.icgeb.res.in/duct-BRCA-CSP.

摘要

早期发现乳腺癌及其正确分期对预后和为乳腺癌患者提供适当的个性化临床治疗至关重要。然而,尽管付出了相当大的努力和取得了进展,但仍需要确定导致浸润性导管癌 (IDC) 进展阶段的具体基因组因素,这有助于确定正确的癌症分期。我们开发了两种机器学习分类模型来区分 IDC 的早期和晚期。预测模型是使用来自癌症基因组图谱 (TCGA) 的代表 610 名患者不同 IDC 阶段的 RNA-seq 基因表达谱进行训练的。使用不同的特征选择方法,通过丰富的模型学习,对不同的监督学习算法进行了训练和评估。我们还使用来自 IDC 驱动基因的训练集减少数据的相同数据集上开发了一个机器学习分类器进行训练。基于这两个分类器,我们开发了一个基于输入 RNA-seq 基因表达谱的 web 服务器 Duct-BRCA-CSP,用于从 IDC 的晚期预测早期阶段。我们的分析还使我们能够更深入地了解伴随 IDC 进展的阶段依赖性分子事件。该服务器可在 http://bioinfo.icgeb.res.in/duct-BRCA-CSP 上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18c0/7057992/be3cd388d9d4/41598_2020_60740_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验