Suppr超能文献

Mut2Vec:癌性突变的分布式表示。

Mut2Vec: distributed representation of cancerous mutations.

作者信息

Kim Sunkyu, Lee Heewon, Kim Keonwoo, Kang Jaewoo

机构信息

Department of Computer Science and Engineering, Korea University, Seoul, Korea.

Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, Korea.

出版信息

BMC Med Genomics. 2018 Apr 20;11(Suppl 2):33. doi: 10.1186/s12920-018-0349-7.

Abstract

BACKGROUND

Embedding techniques for converting high-dimensional sparse data into low-dimensional distributed representations have been gaining popularity in various fields of research. In deep learning models, embedding is commonly used and proven to be more effective than naive binary representation. However, yet no attempt has been made to embed highly sparse mutation profiles into densely distributed representations. Since binary representation does not capture biological context, its use is limited in many applications such as discovering novel driver mutations. Additionally, training distributed representations of mutations is challenging due to a relatively small amount of available biological data compared with the large amount of text corpus data in text mining fields.

METHODS

We introduce Mut2Vec, a novel computational pipeline that can be used to create a distributed representation of cancerous mutations. Mut2Vec is trained on cancer profiles using Skip-Gram since cancer can be characterized by a series of co-occurring mutations. We also augmented our pipeline with existing information in the biomedical literature and protein-protein interaction networks to compensate for the data insufficiency.

RESULTS

To evaluate our models, we conducted two experiments that involved the following tasks: a) visualizing driver and passenger mutations, b) identifying novel driver mutations using a clustering method. Our visualization showed a clear distinction between passenger mutations and driver mutations. We also found driver mutation candidates and proved that these were true driver mutations based on our literature survey. The pre-trained mutation vectors and the candidate driver mutations are publicly available at http://infos.korea.ac.kr/mut2vec .

CONCLUSIONS

We introduce Mut2Vec that can be utilized to generate distributed representations of mutations and experimentally validate the efficacy of the generated mutation representations. Mut2Vec can be used in various deep learning applications such as cancer classification and drug sensitivity prediction.

摘要

背景

将高维稀疏数据转换为低维分布式表示的嵌入技术在各个研究领域越来越受欢迎。在深度学习模型中,嵌入被广泛使用且已被证明比简单的二进制表示更有效。然而,尚未有人尝试将高度稀疏的突变谱嵌入到密集分布的表示中。由于二进制表示无法捕捉生物学背景,其在许多应用中(如发现新的驱动突变)的使用受到限制。此外,与文本挖掘领域大量的文本语料库数据相比,可用的生物学数据量相对较少,因此训练突变的分布式表示具有挑战性。

方法

我们引入了Mut2Vec,这是一种新颖的计算管道,可用于创建癌性突变的分布式表示。由于癌症可以通过一系列共现的突变来表征,因此使用Skip-Gram在癌症谱上训练Mut2Vec。我们还利用生物医学文献和蛋白质-蛋白质相互作用网络中的现有信息来扩充我们的管道,以弥补数据不足。

结果

为了评估我们的模型,我们进行了两项实验,涉及以下任务:a)可视化驱动突变和乘客突变,b)使用聚类方法识别新的驱动突变。我们的可视化显示了乘客突变和驱动突变之间的明显区别。我们还发现了驱动突变候选者,并根据我们的文献调查证明这些是真正的驱动突变。预训练的突变向量和候选驱动突变可在http://infos.korea.ac.kr/mut2vec上公开获取。

结论

我们引入了Mut2Vec,它可用于生成突变的分布式表示,并通过实验验证所生成的突变表示的有效性。Mut2Vec可用于各种深度学习应用,如癌症分类和药物敏感性预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45c0/5918431/c19ba72d56ba/12920_2018_349_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验