Ye Jieping, Liu Jun
Arizona State University Tempe, AZ 85287
SIGKDD Explor. 2012 Jun 1;14(1):4-15. doi: 10.1145/2408736.2408739.
Following recent technological revolutions, the investigation of massive biomedical data with growing scale, diversity, and complexity has taken a center stage in modern data analysis. Although complex, the underlying representations of many biomedical data are often sparse. For example, for a certain disease such as leukemia, even though humans have tens of thousands of genes, only a few genes are relevant to the disease; a gene network is sparse since a regulatory pathway involves only a small number of genes; many biomedical signals are sparse or compressible in the sense that they have concise representations when expressed in a proper basis. Therefore, finding sparse representations is fundamentally important for scientific discovery. Sparse methods based on the [Formula: see text] norm have attracted a great amount of research efforts in the past decade due to its sparsity-inducing property, convenient convexity, and strong theoretical guarantees. They have achieved great success in various applications such as biomarker selection, biological network construction, and magnetic resonance imaging. In this paper, we review state-of-the-art sparse methods and their applications to biomedical data.
随着近期的技术革命,对规模不断扩大、多样性日益增加且复杂性不断提升的海量生物医学数据的研究,已在现代数据分析中占据核心地位。尽管复杂,但许多生物医学数据的底层表示往往是稀疏的。例如,对于白血病等某种疾病,尽管人类拥有数以万计的基因,但只有少数基因与该疾病相关;基因网络是稀疏的,因为调控途径仅涉及少数基因;许多生物医学信号在以适当基表示时具有简洁形式,所以是稀疏的或可压缩的。因此,找到稀疏表示对于科学发现至关重要。基于[公式:见原文]范数的稀疏方法,因其稀疏诱导特性、便利的凸性以及强大的理论保证,在过去十年中吸引了大量研究工作。它们在生物标志物选择、生物网络构建和磁共振成像等各种应用中取得了巨大成功。在本文中,我们综述了最先进的稀疏方法及其在生物医学数据中的应用。