• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用带有先验知识整合的变量筛选方案(SKI)进行高维组学数据分析。

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI).

作者信息

Liu Cong, Jiang Jianping, Gu Jianlei, Yu Zhangsheng, Wang Tao, Lu Hui

机构信息

Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA.

SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.

出版信息

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):118. doi: 10.1186/s12918-016-0358-0.

DOI:10.1186/s12918-016-0358-0
PMID:28155690
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5260139/
Abstract

BACKGROUND

High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this "small n and large p" scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for "knowledge integration" in high-throughput data analysis.

RESULTS

In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods.

CONCLUSION

The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI.

摘要

背景

高通量技术能够在一次实验中生成数千到数百万个生物标志物测量数据。然而,由于样本量小,高通量分析的结果往往几乎不可重复。已经提出了不同的统计方法来处理这种“小样本量和大变量数”的情况,例如,可以将不同的数据集合并或整合在一起,以提供一种提高可重复性的有效方法。然而,由于实验条件不同,原始数据要么不可用,要么难以整合,因此,迫切需要开发一种用于高通量数据分析的“知识整合”方法。

结果

在本研究中,我们提出了一种用于高通量数据分析的综合预筛选方法SKI。基于两个初始排名生成一个新的排名:(1)基于知识的排名;(2)基于边际相关性的排名。我们的模拟表明,在选择相同数量变量的情况下,SKI在真阳性率方面优于其他没有知识整合的方法。我们还将我们的方法应用于药物反应研究,发现其性能优于常规筛选方法。

结论

所提出的方法为高通量分析中的知识整合提供了一种有效途径。使用我们提供的名为SKI的R包可以轻松实现该方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dec/5260139/e381c3241e37/12918_2016_358_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dec/5260139/11776a10ac74/12918_2016_358_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dec/5260139/e381c3241e37/12918_2016_358_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dec/5260139/11776a10ac74/12918_2016_358_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dec/5260139/e381c3241e37/12918_2016_358_Fig2_HTML.jpg

相似文献

1
High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI).使用带有先验知识整合的变量筛选方案(SKI)进行高维组学数据分析。
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):118. doi: 10.1186/s12918-016-0358-0.
2
Integrative prescreening in analysis of multiple cancer genomic studies.综合筛选在多个癌症基因组研究分析中的应用。
BMC Bioinformatics. 2012 Jul 16;13:168. doi: 10.1186/1471-2105-13-168.
3
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
4
Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类:在癌症分子分类中的应用
BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.
5
Cancer Subtype Discovery Based on Integrative Model of Multigenomic Data.基于多组学数据综合模型的癌症亚型发现。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1115-1121. doi: 10.1109/TCBB.2016.2621769. Epub 2016 Oct 26.
6
Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration.用于多组学数据整合的13种无监督方法的聚类和变量选择评估
Brief Bioinform. 2020 Dec 1;21(6):2011-2030. doi: 10.1093/bib/bbz138.
7
Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法
Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.
8
A GMM-IG framework for selecting genes as expression panel biomarkers.一种用于选择基因作为表达谱生物标志物的 GMM-IG 框架。
Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.
9
Integrative Exploratory Analysis of Two or More Genomic Datasets.两个或多个基因组数据集的综合探索性分析
Methods Mol Biol. 2016;1418:19-38. doi: 10.1007/978-1-4939-3578-9_2.
10
Statistical principles for omics-based clinical trials.基于组学的临床试验的统计学原理。
Chin Clin Oncol. 2015 Sep;4(3):29. doi: 10.3978/j.issn.2304-3865.2015.01.02.

引用本文的文献

1
Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification.统计模型构建:基于不当预选的“背景知识”导致模型设定不当。
BMC Med Res Methodol. 2021 Sep 29;21(1):196. doi: 10.1186/s12874-021-01373-z.
2
Coupling sparse Cox models with clustering of longitudinal transcriptomics data for trauma prognosis.将稀疏Cox模型与纵向转录组学数据聚类相结合用于创伤预后分析。
BioData Min. 2021 Apr 14;14(1):25. doi: 10.1186/s13040-021-00257-8.
3
Fabrication approaches for high-throughput and biomimetic disease modeling.

本文引用的文献

1
A three-gene panel that distinguishes benign from malignant thyroid nodules.一种区分甲状腺良性与恶性结节的三基因检测组合。
Int J Cancer. 2015 Apr 1;136(7):1646-54. doi: 10.1002/ijc.29172. Epub 2014 Sep 22.
2
Multiclass classification of sarcomas using pathway based feature selection method.使用基于通路的特征选择方法对肉瘤进行多类别分类。
J Theor Biol. 2014 Dec 7;362:3-8. doi: 10.1016/j.jtbi.2014.06.038. Epub 2014 Jul 8.
3
PGS: a tool for association study of high-dimensional microRNA expression data with repeated measures.
高通量和仿生疾病建模的制造方法。
Acta Biomater. 2021 Sep 15;132:52-82. doi: 10.1016/j.actbio.2021.03.006. Epub 2021 Mar 11.
4
and Regulatory Network Analysis in Colorectal Cancer (CRC) Reveals That Influences CRC Cell Biological Functions and Interacts with miR-6828-5p.结直肠癌(CRC)中的调控网络分析表明,[具体内容]影响CRC细胞生物学功能并与miR-6828-5p相互作用。
Cancer Manag Res. 2020 Dec 22;12:13051-13069. doi: 10.2147/CMAR.S277261. eCollection 2020.
5
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
6
A Novel Joint Gene Set Analysis Framework Improves Identification of Enriched Pathways in Cross Disease Transcriptomic Analysis.一种新型联合基因集分析框架改进了跨疾病转录组分析中富集通路的识别。
Front Genet. 2019 Apr 12;10:293. doi: 10.3389/fgene.2019.00293. eCollection 2019.
7
Application of Sparse Linear Discriminant Analysis and Elastic Net for Diagnosis of IgA Nephropathy: Statistical and Biological Viewpoints.稀疏线性判别分析和弹性网络在IgA肾病诊断中的应用:统计与生物学视角
Iran Biomed J. 2018 Nov;22(6):374-84. doi: 10.29252/.22.6.374. Epub 2018 Mar 10.
8
Integrated analysis of DNA-methylation and gene expression using high-dimensional penalized regression: a cohort study on bone mineral density in postmenopausal women.使用高维惩罚回归对DNA甲基化和基因表达进行综合分析:一项关于绝经后女性骨密度的队列研究。
BMC Med Genomics. 2018 Mar 7;11(1):24. doi: 10.1186/s12920-018-0341-2.
PGS:一种用于高维微小RNA表达数据与重复测量关联研究的工具。
Bioinformatics. 2014 Oct;30(19):2802-7. doi: 10.1093/bioinformatics/btu396. Epub 2014 Jun 19.
4
A community effort to assess and improve drug sensitivity prediction algorithms.一项评估和改进药物敏感性预测算法的社区工作。
Nat Biotechnol. 2014 Dec;32(12):1202-12. doi: 10.1038/nbt.2877. Epub 2014 Jun 1.
5
Drug2Gene: an exhaustive resource to explore effectively the drug-target relation network.Drug2Gene:一个详尽的资源,用于深入探索药物-靶标关系网络。
BMC Bioinformatics. 2014 Mar 11;15:68. doi: 10.1186/1471-2105-15-68.
6
Development, characterization, and reversal of acquired resistance to the MEK1 inhibitor selumetinib (AZD6244) in an in vivo model of childhood astrocytoma.在儿童脑星形细胞瘤的体内模型中,对 MEK1 抑制剂 selumetinib(AZD6244)获得性耐药的发展、特征描述和逆转。
Clin Cancer Res. 2013 Dec 15;19(24):6716-29. doi: 10.1158/1078-0432.CCR-13-0842. Epub 2013 Oct 16.
7
Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.基于全基因组 SNP 估算的五种精神障碍的遗传关系。
Nat Genet. 2013 Sep;45(9):984-94. doi: 10.1038/ng.2711. Epub 2013 Aug 11.
8
Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.五种主要精神疾病具有共同影响的风险基因座的鉴定:全基因组分析。
Lancet. 2013 Apr 20;381(9875):1371-1379. doi: 10.1016/S0140-6736(12)62129-1. Epub 2013 Feb 28.
9
Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data.通过拷贝数和表达数据的综合分析鉴定乳腺癌中的转移相关基因。
PLoS One. 2013;8(1):e53014. doi: 10.1371/journal.pone.0053014. Epub 2013 Jan 30.
10
Weighted lasso with data integration.具有数据整合功能的加权套索法
Stat Appl Genet Mol Biol. 2011 Aug 29;10(1):/j/sagmb.2011.10.issue-1/sagmb.2011.10.1.1703/sagmb.2011.10.1.1703.xml. doi: 10.2202/1544-6115.1703.