Suppr超能文献

暗物质很重要:利用最暗的 DNA 区分微妙的血液癌症。

Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA.

机构信息

IBM Research, Yorktown Heights, New York, United States of America.

MLL Munich Leukemia Laboratory, Munich, Germany.

出版信息

PLoS Comput Biol. 2019 Aug 30;15(8):e1007332. doi: 10.1371/journal.pcbi.1007332. eCollection 2019 Aug.

Abstract

The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.

摘要

深度测序和强大的机器学习的融合正在为非编码基因组区域(缺乏任何功能注释)这一最黑暗的基因组物质提供前所未有的深入了解。虽然深度测序揭示了罕见的肿瘤变异,但疾病的异质性使机器学习(ML)算法的最佳算法变得复杂。在这里,我们着手回答基因组的暗物质是否包含可以区分疾病精细亚型的信号,否则这些亚型在基因组上是无法区分的。我们引入了一种新的随机正则化方法 ReVeaL,它使 ML 能够即使从相同的“起源细胞”中区分出微妙的癌症亚型。类似于全基因组隐含定义的遗传性,我们使用可定义基因组部分的可预测性(F1 分数)。为了使用暗物质 DNA 来区分癌症亚型,我们将 ReVeaL 应用于来自 727 名患有七种血液癌患者的新 WGS 数据集,并在包括基因、非暗、非编码、非基因和暗在内的多个基因组区域评估了预测能力。ReVeaL 使所有基因组片段的癌症亚型的区分能力得到了提高。非基因、非编码和暗物质的 F1 分数最高,暗物质的预测能力最高。基于 ReVeaL 对不同基因组区域的预测能力,暗物质包含足以显著区分疾病精细亚型的信号。因此,即使在基因组中尚未注释和理解甚少的区域,罕见变异的聚集也可能在疾病病因学中发挥重要作用,值得更多关注。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44f6/6742441/ee548abc1dd5/pcbi.1007332.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验