Suppr超能文献

一种用于短寡核苷酸微阵列图谱的完全可扩展的在线预处理算法。

A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases.

机构信息

Department of Veterinary Bioscience, University of Helsinki, Agnes Sjöbergin katu 2, PO Box 66, FI-00014 University of Helsinki, Finland.

出版信息

Nucleic Acids Res. 2013 May 1;41(10):e110. doi: 10.1093/nar/gkt229. Epub 2013 Apr 5.

Abstract

Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.

摘要

大量且标准化的微阵列数据的快速积累为全面描述基因组功能开辟了新的机会。然而,当前预处理技术的有限可扩展性已成为充分利用这些数据资源的瓶颈。尽管短寡核苷酸阵列构成了全基因组分析数据的主要来源,但可扩展的探针级技术仅适用于少数基于受限参考训练集的预计算探针效果的平台。为了克服这些关键限制,我们引入了一种完全可扩展的在线学习算法,用于涉及数千个阵列的大型微阵列图谱的探针级分析和预处理。与替代方案相比,我们的算法在样本量方面呈线性扩展,适用于所有短寡核苷酸平台。该模型可以使用迄今为止可用的最全面的数据集合来确定受噪声和偏差影响的单个探针,从而提供指导阵列设计和质量控制的工具。这是唯一可用的算法,它可以基于小的连续数据批次的顺序超参数更新来学习探针级参数,从而避免标准方法的大量内存需求,并为充分利用当代微阵列数据集开辟新的机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45ac/3664815/129f01ca15bc/gkt229f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验