Suppr超能文献

基于数据驱动的跨异质样本集的分子表型特征分析。

Data-driven characterization of molecular phenotypes across heterogeneous sample collections.

机构信息

Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland.

Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.

出版信息

Nucleic Acids Res. 2019 Jul 26;47(13):e76. doi: 10.1093/nar/gkz281.

Abstract

Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.

摘要

现有的大型基因表达数据存储库具有阐明疾病机制、描述细胞途径变化以及根据分子谱对患者进行分层的巨大潜力。为了实现这一目标,需要集成资源和工具,以允许跨数据集和数据类型比较结果。我们提出了一种直观的方法,用于对分子谱进行数据驱动的分层,并使用基于血液系统恶性肿瘤的多研究和多平台数据的降维算法 t 分布随机邻域嵌入 (t-SNE) 对我们的方法进行基准测试。我们的方法能够评估生物学与技术变异对样本聚类的贡献,直接将额外的数据集纳入同一低维表示,比较来自单独 t-SNE 表示的不同分子疾病亚型,并基于途径数据库和其他数据对获得的聚类进行特征描述。通过这种方式,我们在多个多组学急性髓系白血病研究中进行了综合分析。我们的方法表明,在缺乏融合基因的样本中,存在新的具有不同生存和药物反应性的分子亚型,包括一个新的骨髓增生异常综合征样簇和一个以 CEBPA 突变和 S-腺苷甲硫氨酸依赖性 DNA 甲基化途径的差异活性为特征的簇。总之,跨多个研究的整合可以帮助识别新的分子疾病亚型,并深入了解疾病生物学。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验