Suppr超能文献

PAN:基于个性化标注的乳腺癌复发预测网络。

PAN: Personalized Annotation-Based Networks for the Prediction of Breast Cancer Relapse.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2841-2847. doi: 10.1109/TCBB.2021.3076422. Epub 2021 Dec 8.

Abstract

The classification of clinical samples based on gene expression data is an important part of precision medicine. In this manuscript, we show how transforming gene expression data into a set of personalized (sample-specific) networks can allow us to harness existing graph-based methods to improve classifier performance. Existing approaches to personalized gene networks have the limitation that they depend on other samples in the data and must get re-computed whenever a new sample is introduced. Here, we propose a novel method, called Personalized Annotation-based Networks (PAN), that avoids this limitation by using curated annotation databases to transform gene expression data into a graph. Unlike competing methods, PANs are calculated for each sample independent of the population, making it a more efficient way to obtain single-sample networks. Using three breast cancer datasets as a case study, we show that PAN classifiers not only predict cancer relapse better than gene features alone, but also outperform PPI (protein-protein interactions) and population-level graph-based classifiers. This work demonstrates the practical advantages of graph-based classification for high-dimensional genomic data, while offering a new approach to making sample-specific networks. Supplementary information: PAN and the baselines are implemented in Python. Source code and data are available at https://github.com/thinng/PAN.

摘要

基于基因表达数据的临床样本分类是精准医学的重要组成部分。在本文中,我们展示了如何将基因表达数据转化为一组个性化(样本特定)网络,从而利用现有的基于图的方法来提高分类器的性能。现有的个性化基因网络方法的局限性在于,它们依赖于数据中的其他样本,并且每当引入新样本时,都必须重新计算。在这里,我们提出了一种新的方法,称为基于个性化注释的网络(PAN),它通过使用精心整理的注释数据库将基因表达数据转换为图,从而避免了这种限制。与竞争方法不同,PAN 是为每个样本独立计算的,而不是基于人群,因此是获取单一样本网络的更有效方法。我们使用三个乳腺癌数据集作为案例研究,表明 PAN 分类器不仅可以比仅使用基因特征更好地预测癌症复发,而且还优于 PPI(蛋白质-蛋白质相互作用)和基于人群的基于图的分类器。这项工作证明了基于图的分类方法在高维基因组数据中的实际优势,同时提供了一种新的方法来制作样本特定的网络。补充信息:PAN 和基线均使用 Python 实现。源代码和数据可在 https://github.com/thinng/PAN 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验