Suppr超能文献

CFMDS:基于 CUDA 的大规模基因组数据快速多维尺度分析。

CFMDS: CUDA-based fast multidimensional scaling for genome-scale data.

机构信息

School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Korea.

出版信息

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S23. doi: 10.1186/1471-2105-13-S17-S23. Epub 2012 Dec 13.

Abstract

BACKGROUND

Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as much as possible. However, it is not trivial to apply it to genome-scale data (e.g., microarray gene expression profiles) on regular desktop computers, because of its high computational complexity.

RESULTS

We implemented a highly-efficient software application, called CFMDS (CUDA-based Fast MultiDimensional Scaling), which produces an approximate solution of the classical MDS based on CUDA (compute unified device architecture) and the divide-and-conquer principle. CUDA is a parallel computing architecture exploiting the power of the GPU (graphics processing unit). The principle of divide-and-conquer was adopted for circumventing the small memory problem of usual graphics cards. Our application software has been tested on various benchmark datasets including microarrays and compared with the classical MDS algorithms implemented using C# and MATLAB. In our experiments, CFMDS was more than a hundred times faster for large data than such general solutions. Regarding the quality of dimensionality reduction, our approximate solutions were as good as those from the general solutions, as the Pearson's correlation coefficients between them were larger than 0.9.

CONCLUSIONS

CFMDS is an expeditious solution for the data dimensionality reduction problem. It is especially useful for efficient processing of genome-scale data consisting of several thousands of objects in several minutes.

摘要

背景

多维尺度分析(MDS)是一种广泛应用于降维的方法。它已被应用于各种领域的特征选择和可视化。在各种 MDS 方法中,经典 MDS 是一种简单且理论上合理的解决方案,可将数据对象投影到低维空间中,同时尽可能多地保留它们之间的原始距离。然而,由于其计算复杂性,将其应用于常规台式计算机上的基因组规模数据(例如微阵列基因表达谱)并不简单。

结果

我们实现了一个高效的软件应用程序,称为 CFMDS(基于 CUDA 的快速多维尺度分析),它基于 CUDA(计算统一设备架构)和分治原则生成经典 MDS 的近似解。CUDA 是一种利用 GPU(图形处理单元)功能的并行计算架构。分治原则被采用来规避通常的图形卡的小内存问题。我们的应用软件已经在各种基准数据集(包括微阵列)上进行了测试,并与使用 C#和 MATLAB 实现的经典 MDS 算法进行了比较。在我们的实验中,CFMDS 对于大数据的速度比一般解决方案快一百多倍。关于降维的质量,我们的近似解与一般解一样好,因为它们之间的皮尔逊相关系数大于 0.9。

结论

CFMDS 是一种快速的数据降维解决方案。它对于在几分钟内处理包含数千个对象的基因组规模数据非常有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/3521231/1a91e254896e/1471-2105-13-S17-S23-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验