• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于稀疏列联表的惩罚似然法及其在全长cDNA文库中的应用

Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries.

作者信息

Dahinden Corinne, Parmigiani Giovanni, Emerick Mark C, Bühlmann Peter

机构信息

Seminar für Statistik, ETH Zürich, CH-8092 Zürich, Switzerland.

出版信息

BMC Bioinformatics. 2007 Dec 11;8:476. doi: 10.1186/1471-2105-8-476.

DOI:10.1186/1471-2105-8-476
PMID:18072965
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2233645/
Abstract

BACKGROUND

The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species.

RESULTS

We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient l1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries.

CONCLUSION

We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.

摘要

背景

对多个分类变量进行联合分析是生物学许多领域的常见任务,并且正成为系统生物学研究的核心,其目标是识别属于一个网络的变量之间潜在的复杂相互作用。传统上,任意复杂程度的相互作用在统计学中通过对数线性模型进行建模。将这些模型扩展到计算生物学中出现的高维且可能稀疏的数据具有挑战性。一个重要的例子,也是本文的动机所在,是对所谓的可变剪接基因的全长cDNA文库进行分析,在这个例子中我们研究转录本物种中各种外显子的存在之间的关系。

结果

我们开发了在对数线性模型中进行模型选择和参数估计的方法,用于分析稀疏列联表,以研究两个或更多因素的相互作用。由于列联表单元格中存在零值,对数线性模型系数的最大似然估计可能不合适,因此需要新的方法。我们提出了一种计算效率高的l1惩罚方法,将套索算法扩展到这种情况,并在模拟研究中将其与其他方法进行比较。然后我们在全长cDNA文库产生的列联表上展示这些算法。

结论

我们提出的正则化方法能够成功地用于检测广泛的涉及分类变量的生物学问题中分类变量之间的复杂相互作用模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ae1/2233645/0650d8bd8d5c/1471-2105-8-476-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ae1/2233645/03bb95ee5a79/1471-2105-8-476-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ae1/2233645/0650d8bd8d5c/1471-2105-8-476-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ae1/2233645/03bb95ee5a79/1471-2105-8-476-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ae1/2233645/0650d8bd8d5c/1471-2105-8-476-2.jpg

相似文献

1
Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries.用于稀疏列联表的惩罚似然法及其在全长cDNA文库中的应用
BMC Bioinformatics. 2007 Dec 11;8:476. doi: 10.1186/1471-2105-8-476.
2
Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models.使用稀疏贝叶斯模型高效识别基因调控网络的实验设计。
BMC Syst Biol. 2007 Nov 16;1:51. doi: 10.1186/1752-0509-1-51.
3
Maximum Augmented Empirical Likelihood Estimation of Categorical Marginal Models for Large Sparse Contingency Tables.最大增广经验似然估计在大型稀疏列联表分类边缘模型中的应用。
Psychometrika. 2023 Dec;88(4):1228-1248. doi: 10.1007/s11336-023-09932-7. Epub 2023 Sep 26.
4
Decomposition and model selection for large contingency tables.大型列联表的分解与模型选择
Biom J. 2010 Apr;52(2):233-52. doi: 10.1002/bimj.200900083.
5
Logistic regression by means of evolutionary radial basis function neural networks.基于进化径向基函数神经网络的逻辑回归
IEEE Trans Neural Netw. 2011 Feb;22(2):246-63. doi: 10.1109/TNN.2010.2093537. Epub 2010 Dec 6.
6
Bayesian kernel methods for analysis of functional neuroimages.用于功能性神经影像分析的贝叶斯核方法。
IEEE Trans Med Imaging. 2007 Dec;26(12):1613-24. doi: 10.1109/tmi.2007.896934.
7
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.使用带贝叶斯正则化的稀疏逻辑回归进行癌症分类中的基因选择。
Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Statistical inference for assessing functional connectivity of neuronal ensembles with sparse spiking data.基于稀疏放电数据评估神经元集合功能连接的统计推断。
IEEE Trans Neural Syst Rehabil Eng. 2011 Apr;19(2):121-35. doi: 10.1109/TNSRE.2010.2086079. Epub 2010 Oct 11.
10
Block-iterative Fisher scoring algorithms for maximum penalized likelihood image reconstruction in emission tomography.用于发射断层扫描中最大惩罚似然图像重建的块迭代Fisher评分算法。
IEEE Trans Med Imaging. 2008 Aug;27(8):1130-42. doi: 10.1109/TMI.2008.918355.

引用本文的文献

1
Statistical Methods and Software for Substance Use and Dependence Genetic Research.物质使用与依赖遗传研究的统计方法与软件
Curr Genomics. 2019 Apr;20(3):172-183. doi: 10.2174/1389202920666190617094930.
2
Bayesian modeling of temporal dependence in large sparse contingency tables.大型稀疏列联表中时间依赖性的贝叶斯建模。
J Am Stat Assoc. 2013 Jan 1;108(504):1324-1338. doi: 10.1080/01621459.2013.823866.
3
Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.

本文引用的文献

1
The transcriptional landscape of the mammalian genome.哺乳动物基因组的转录图谱。
Science. 2005 Sep 2;309(5740):1559-63. doi: 10.1126/science.1112014.
2
The effect of higher order RNA processes on changing patterns of protein domain selection: a developmentally regulated transcriptome of type 1 inositol 1,4,5-trisphosphate receptors.高阶RNA过程对蛋白质结构域选择变化模式的影响:1型肌醇1,4,5-三磷酸受体的发育调控转录组
Proteins. 2005 May 1;59(2):312-31. doi: 10.1002/prot.20225.
3
Finishing the euchromatic sequence of the human genome.
基于下一代 mRNA 测序(RNA-Seq)数据的稀疏线性建模用于发现异构体和丰度估计。
Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):19867-72. doi: 10.1073/pnas.1113972108. Epub 2011 Dec 1.
完成人类基因组的常染色质序列测定。
Nature. 2004 Oct 21;431(7011):931-45. doi: 10.1038/nature03001.
4
Has the yo-yo stopped? An assessment of human protein-coding gene number.悠悠球停下来了吗?对人类蛋白质编码基因数量的评估。
Proteomics. 2004 Jun;4(6):1712-26. doi: 10.1002/pmic.200300700.
5
Integrative annotation of 21,037 human genes validated by full-length cDNA clones.由全长cDNA克隆验证的21,037个人类基因的综合注释。
PLoS Biol. 2004 Jun;2(6):e162. doi: 10.1371/journal.pbio.0020162. Epub 2004 Apr 20.
6
Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome.通过与小鼠基因组比对鉴定出的小鼠全长cDNA中的剪接变异。
Genome Res. 2002 Sep;12(9):1377-85. doi: 10.1101/gr.191702.
7
Alternative splicing and genome complexity.可变剪接与基因组复杂性。
Nat Genet. 2002 Jan;30(1):29-30. doi: 10.1038/ng803. Epub 2001 Dec 17.
8
Initial sequencing and analysis of the human genome.人类基因组的初步测序与分析。
Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062.
9
Gene index analysis of the human genome estimates approximately 120,000 genes.对人类基因组的基因索引分析估计约有120000个基因。
Nat Genet. 2000 Jun;25(2):239-40. doi: 10.1038/76126.
10
EST comparison indicates 38% of human mRNAs contain possible alternative splice forms.EST 比较表明,38%的人类 mRNA 含有可能的可变剪接形式。
FEBS Lett. 2000 May 26;474(1):83-6. doi: 10.1016/s0014-5793(00)01581-7.