使用带有先验知识整合的变量筛选方案（SKI）进行高维组学数据分析。

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI).

作者信息

Liu Cong, Jiang Jianping, Gu Jianlei, Yu Zhangsheng, Wang Tao, Lu Hui

机构信息

Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA.

SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.

出版信息

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):118. doi: 10.1186/s12918-016-0358-0.

DOI:10.1186/s12918-016-0358-0

PMID:28155690

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5260139/

Abstract

BACKGROUND

High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this "small n and large p" scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for "knowledge integration" in high-throughput data analysis.

RESULTS

In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods.

CONCLUSION

The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI.

摘要

背景

高通量技术能够在一次实验中生成数千到数百万个生物标志物测量数据。然而，由于样本量小，高通量分析的结果往往几乎不可重复。已经提出了不同的统计方法来处理这种“小样本量和大变量数”的情况，例如，可以将不同的数据集合并或整合在一起，以提供一种提高可重复性的有效方法。然而，由于实验条件不同，原始数据要么不可用，要么难以整合，因此，迫切需要开发一种用于高通量数据分析的“知识整合”方法。

结果

在本研究中，我们提出了一种用于高通量数据分析的综合预筛选方法SKI。基于两个初始排名生成一个新的排名：（1）基于知识的排名；（2）基于边际相关性的排名。我们的模拟表明，在选择相同数量变量的情况下，SKI在真阳性率方面优于其他没有知识整合的方法。我们还将我们的方法应用于药物反应研究，发现其性能优于常规筛选方法。

结论

所提出的方法为高通量分析中的知识整合提供了一种有效途径。使用我们提供的名为SKI的R包可以轻松实现该方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用带有先验知识整合的变量筛选方案（SKI）进行高维组学数据分析。

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI).

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

使用带有先验知识整合的变量筛选方案（SKI）进行高维组学数据分析。

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI).

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献