• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

儿童急性髓系白血病的亚型预测:重新审视使用差异网络秩守恒的分类方法

Subtype prediction in pediatric acute myeloid leukemia: classification using differential network rank conservation revisited.

作者信息

Obulkasim Askar, Fornerod Maarten, Zwaan Michel C, Reinhardt Dirk, van den Heuvel-Eibrink Marry M

机构信息

Department of Pediatric Oncology/Hematology, Erasmus-MC Sophia Childrens Hospital, Rotterdam, The Netherlands.

Dutch Children's Oncology Group, Erasmus-MC Sophia Children's Hospital, Rotterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2015 Sep 23;16:305. doi: 10.1186/s12859-015-0737-3.

DOI:10.1186/s12859-015-0737-3
PMID:26399969
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4580220/
Abstract

BACKGROUND

One of the most important application spectrums of transcriptomic data is cancer phenotype classification. Many characteristics of transcriptomic data, such as redundant features and technical artifacts, make over-fitting commonplace. Promising classification results often fail to generalize across datasets with different sources, platforms, or preprocessing. Recently a novel differential network rank conservation (DIRAC) algorithm to characterize cancer phenotypes using transcriptomic data. DIRAC is a member of a family of algorithms that have shown useful for disease classification based on the relative expression of genes. Combining the robustness of this family's simple decision rules with known biological relationships, this systems approach identifies interpretable, yet highly discriminate networks. While DIRAC has been briefly employed for several classification problems in the original paper, the potentials of DIRAC in cancer phenotype classification, and especially robustness against artifacts in transcriptomic data have not been fully characterized yet.

RESULTS

In this study we thoroughly investigate the potentials of DIRAC by applying it to multiple datasets, and examine the variations in classification performances when datasets are (i) treated and untreated for batch effect; (ii) preprocessed with different techniques. We also propose the first DIRAC-based classifier to integrate multiple networks. We show that the DIRAC-based classifier is very robust in the examined scenarios. To our surprise, the trained DIRAC-based classifier even translated well to a dataset with different biological characteristics in the presence of substantial batch effects that, as shown here, plagued the standard expression value based classifier. In addition, the DIRAC-based classifier, because of the integrated biological information, also suggests pathways to target in specific subtypes, which may enhance the establishment of personalized therapy in diseases such as pediatric AML. In order to better comprehend the prediction power of the DIRAC-based classifier in general, we also performed classifications using publicly available datasets from breast and lung cancer. Furthermore, multiple well-known classification algorithms were utilized to create an ideal test bed for comparing the DIRAC-based classifier with the standard gene expression value based classifier. We observed that the DIRAC-based classifier greatly outperforms its rival.

CONCLUSIONS

Based on our experiments with multiple datasets, we propose that DIRAC is a promising solution to the lack of generalizability in classification efforts that uses transcriptomic data. We believe that superior performances presented in this study may motivate other to initiate a new aline of research to explore the untapped power of DIRAC in a broad range of cancer types.

摘要

背景

转录组数据最重要的应用领域之一是癌症表型分类。转录组数据的许多特征,如冗余特征和技术假象,使得过拟合现象很常见。有前景的分类结果往往无法在不同来源、平台或预处理的数据集之间进行泛化。最近,一种新的基于差异网络秩守恒(DIRAC)的算法被用于利用转录组数据表征癌症表型。DIRAC是一类算法中的一员,这类算法已被证明在基于基因相对表达的疾病分类中很有用。该系统方法将这类算法简单决策规则的稳健性与已知的生物学关系相结合,识别出可解释但具有高度区分性的网络。虽然在原始论文中DIRAC已被简要应用于几个分类问题,但DIRAC在癌症表型分类中的潜力,尤其是对转录组数据中假象的稳健性,尚未得到充分表征。

结果

在本研究中,我们通过将DIRAC应用于多个数据集,全面研究了其潜力,并考察了在以下情况下分类性能的变化:(i)对数据集进行批效应处理和未处理;(ii)用不同技术进行预处理。我们还提出了第一个基于DIRAC的集成多个网络的分类器。我们表明,基于DIRAC的分类器在考察的场景中非常稳健。令我们惊讶的是,经过训练的基于DIRAC的分类器甚至能很好地迁移到具有不同生物学特征的数据集,即使存在严重的批效应,而正如这里所示,批效应困扰着基于标准表达值的分类器。此外,基于DIRAC的分类器由于整合了生物学信息,还能指出特定亚型中可靶向的通路,这可能有助于在小儿急性髓系白血病等疾病中建立个性化治疗方案。为了更全面地理解基于DIRAC的分类器的预测能力,我们还使用乳腺癌和肺癌的公开可用数据集进行了分类。此外,利用多种知名分类算法创建了一个理想的测试平台,用于将基于DIRAC的分类器与基于标准基因表达值的分类器进行比较。我们观察到基于DIRAC的分类器大大优于其对手。

结论

基于我们对多个数据集的实验,我们提出DIRAC是解决使用转录组数据进行分类时缺乏泛化性问题的一个有前景的解决方案。我们相信本研究中展示的卓越性能可能会促使其他人开展新的研究方向,以探索DIRAC在广泛癌症类型中尚未开发的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/dae678056658/12859_2015_737_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/6d9dac0509d5/12859_2015_737_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/cfb1990d28e4/12859_2015_737_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/ebf6d13910c5/12859_2015_737_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/adbff3f61bd8/12859_2015_737_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/dbf2e5321670/12859_2015_737_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/be2b06489a45/12859_2015_737_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/901bcffbfdf3/12859_2015_737_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/ede3682a04b5/12859_2015_737_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/dae678056658/12859_2015_737_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/6d9dac0509d5/12859_2015_737_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/cfb1990d28e4/12859_2015_737_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/ebf6d13910c5/12859_2015_737_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/adbff3f61bd8/12859_2015_737_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/dbf2e5321670/12859_2015_737_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/be2b06489a45/12859_2015_737_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/901bcffbfdf3/12859_2015_737_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/ede3682a04b5/12859_2015_737_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a881/4580220/dae678056658/12859_2015_737_Fig9_HTML.jpg

相似文献

1
Subtype prediction in pediatric acute myeloid leukemia: classification using differential network rank conservation revisited.儿童急性髓系白血病的亚型预测:重新审视使用差异网络秩守恒的分类方法
BMC Bioinformatics. 2015 Sep 23;16:305. doi: 10.1186/s12859-015-0737-3.
2
Classification of pediatric acute myeloid leukemia based on miRNA expression profiles.基于微小RNA表达谱的儿童急性髓系白血病分类
Oncotarget. 2017 May 16;8(20):33078-33085. doi: 10.18632/oncotarget.16525.
3
Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC).通过差异秩守恒(DIRAC)识别严格调控和可变表达的网络。
PLoS Comput Biol. 2010 May 27;6(5):e1000792. doi: 10.1371/journal.pcbi.1000792.
4
Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。
Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.
5
Integration of gene expression and DNA-methylation profiles improves molecular subtype classification in acute myeloid leukemia.基因表达与DNA甲基化图谱的整合改善了急性髓系白血病的分子亚型分类。
BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-16-S4-S5. Epub 2015 Feb 23.
6
Data-driven characterization of molecular phenotypes across heterogeneous sample collections.基于数据驱动的跨异质样本集的分子表型特征分析。
Nucleic Acids Res. 2019 Jul 26;47(13):e76. doi: 10.1093/nar/gkz281.
7
A novel bi-level meta-analysis approach: applied to biological pathway analysis.一种新型的双层次荟萃分析方法:应用于生物通路分析。
Bioinformatics. 2016 Feb 1;32(3):409-16. doi: 10.1093/bioinformatics/btv588. Epub 2015 Oct 14.
8
Analyse multiple disease subtypes and build associated gene networks using genome-wide expression profiles.利用全基因组表达谱分析多种疾病亚型并构建相关基因网络。
BMC Genomics. 2015;16 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2164-16-S5-S3. Epub 2015 May 26.
9
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。
Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.
10
A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.基于图论的用于临床诊断的 cDNA 微阵列基因表达数据分类器。
IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):577-91. doi: 10.1109/TCBB.2010.90.

引用本文的文献

1
The Heterogeneity Problem: Approaches to Identify Psychiatric Subtypes.异质性问题:识别精神科亚型的方法。
Trends Cogn Sci. 2019 Jul;23(7):584-601. doi: 10.1016/j.tics.2019.03.009. Epub 2019 May 29.
2
Data Science for Child Health.儿童健康数据科学
J Pediatr. 2019 May;208:12-22. doi: 10.1016/j.jpeds.2018.12.041. Epub 2019 Jan 25.

本文引用的文献

1
Learning dysregulated pathways in cancers from differential variability analysis.通过差异变异性分析了解癌症中失调的信号通路。
Cancer Inform. 2014 Oct 23;13(Suppl 5):61-7. doi: 10.4137/CIN.S14066. eCollection 2014.
2
Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.通过置换替代变量分析进行基因组批次校正以保留生物异质性。
Bioinformatics. 2014 Oct;30(19):2757-63. doi: 10.1093/bioinformatics/btu375. Epub 2014 Jun 6.
3
PhenoNet: identification of key networks associated with disease phenotype.
PhenoNet:与疾病表型相关的关键网络的识别
Bioinformatics. 2014 Sep 1;30(17):2399-405. doi: 10.1093/bioinformatics/btu199. Epub 2014 May 7.
4
Transcriptional analysis of aggressiveness and heterogeneity across grades of astrocytomas.星形细胞瘤分级间侵袭性和异质性的转录分析。
PLoS One. 2013 Oct 11;8(10):e76694. doi: 10.1371/journal.pone.0076694. eCollection 2013.
5
Network-based stratification of tumor mutations.基于网络的肿瘤突变分层。
Nat Methods. 2013 Nov;10(11):1108-15. doi: 10.1038/nmeth.2651. Epub 2013 Sep 15.
6
Impact of DNA microarray data transformation on gene expression analysis - comparison of two normalization methods.DNA微阵列数据转换对基因表达分析的影响——两种标准化方法的比较
Acta Biochim Pol. 2011;58(4):573-80. Epub 2011 Dec 20.
7
Evaluation of gene expression signatures predictive of cytogenetic and molecular subtypes of pediatric acute myeloid leukemia.评估预测儿童急性髓系白血病细胞遗传学和分子亚型的基因表达谱。
Haematologica. 2011 Feb;96(2):221-30. doi: 10.3324/haematol.2010.029660. Epub 2010 Oct 22.
8
Over-optimism in bioinformatics: an illustration.生物信息学中的过度乐观:一个例证。
Bioinformatics. 2010 Aug 15;26(16):1990-8. doi: 10.1093/bioinformatics/btq323. Epub 2010 Jun 26.
9
Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC).通过差异秩守恒(DIRAC)识别严格调控和可变表达的网络。
PLoS Comput Biol. 2010 May 27;6(5):e1000792. doi: 10.1371/journal.pcbi.1000792.
10
An integrative -omics approach to identify functional sub-networks in human colorectal cancer.一种综合组学方法,用于鉴定人类结直肠癌中的功能子网络。
PLoS Comput Biol. 2010 Jan 15;6(1):e1000639. doi: 10.1371/journal.pcbi.1000639.