Suppr超能文献

IsoLasso:一种基于RNA测序的转录组组装的套索回归方法。

IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly.

作者信息

Li Wei, Feng Jianxing, Jiang Tao

机构信息

Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA 92507, USA.

出版信息

J Comput Biol. 2011 Nov;18(11):1693-707. doi: 10.1089/cmb.2011.0171. Epub 2011 Sep 27.

Abstract

The new second generation sequencing technology revolutionizes many biology-related research fields and poses various computational biology challenges. One of them is transcriptome assembly based on RNA-Seq data, which aims at reconstructing all full-length mRNA transcripts simultaneously from millions of short reads. In this article, we consider three objectives in transcriptome assembly: the maximization of prediction accuracy, minimization of interpretation, and maximization of completeness. The first objective, the maximization of prediction accuracy, requires that the estimated expression levels based on assembled transcripts should be as close as possible to the observed ones for every expressed region of the genome. The minimization of interpretation follows the parsimony principle to seek as few transcripts in the prediction as possible. The third objective, the maximization of completeness, requires that the maximum number of mapped reads (or ?expressed segments? in gene models) be explained by (i.e., contained in) the predicted transcripts in the solution. Based on the above three objectives, we present IsoLasso, a new RNA-Seq based transcriptome assembly tool. IsoLasso is based on the well-known LASSO algorithm, a multivariate regression method designated to seek a balance between the maximization of prediction accuracy and the minimization of interpretation. By including some additional constraints in the quadratic program involved in LASSO, IsoLasso is able to make the set of assembled transcripts as complete as possible. Experiments on simulated and real RNA-Seq datasets show that IsoLasso achieves, simultaneously, higher sensitivity and precision than the state-of-art transcript assembly tools.

摘要

新一代测序技术革新了许多与生物学相关的研究领域,并带来了各种计算生物学挑战。其中之一是基于RNA测序数据的转录组组装,其目的是从数百万条短读段中同时重建所有全长mRNA转录本。在本文中,我们考虑转录组组装中的三个目标:预测准确性最大化、解读最小化和完整性最大化。第一个目标,即预测准确性最大化,要求基于组装转录本估计的表达水平应尽可能接近基因组每个表达区域的观测值。解读最小化遵循简约原则,在预测中寻求尽可能少的转录本。第三个目标,即完整性最大化,要求预测转录本在解决方案中解释(即包含)最大数量的比对读段(或基因模型中的“表达片段”)。基于上述三个目标,我们提出了IsoLasso,一种基于RNA测序的新型转录组组装工具。IsoLasso基于著名的套索算法,这是一种多元回归方法,旨在在预测准确性最大化和解读最小化之间寻求平衡。通过在套索算法涉及的二次规划中纳入一些额外约束,IsoLasso能够使组装转录本集尽可能完整。在模拟和真实RNA测序数据集上的实验表明,IsoLasso同时实现了比现有转录本组装工具更高的灵敏度和精度。

相似文献

1
IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly.
J Comput Biol. 2011 Nov;18(11):1693-707. doi: 10.1089/cmb.2011.0171. Epub 2011 Sep 27.
2
Accurate inference of isoforms from multiple sample RNA-Seq data.
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S15. doi: 10.1186/1471-2164-16-S2-S15. Epub 2015 Jan 21.
3
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.
Nat Biotechnol. 2015 Mar;33(3):290-5. doi: 10.1038/nbt.3122. Epub 2015 Feb 18.
4
ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.
Bioinformatics. 2014 Mar 1;30(5):644-51. doi: 10.1093/bioinformatics/btt591. Epub 2013 Oct 15.
5
Inference of isoforms from short sequence reads.
J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.
6
CLASS: constrained transcript assembly of RNA-seq reads.
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S14. doi: 10.1186/1471-2105-14-S5-S14. Epub 2013 Apr 10.
7
A robust method for transcript quantification with RNA-seq data.
J Comput Biol. 2013 Mar;20(3):167-87. doi: 10.1089/cmb.2012.0230.
8
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
9
QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm.
Sci China Life Sci. 2019 Jul;62(7):937-946. doi: 10.1007/s11427-018-9433-3. Epub 2019 May 22.

引用本文的文献

2
Cov-trans: an efficient algorithm for discontinuous transcript assembly in coronaviruses.
BMC Genomics. 2024 Dec 30;25(1):1257. doi: 10.1186/s12864-024-11179-0.
3
Transcriptomic landscape of quiescent and proliferating human corneal stromal fibroblasts.
Exp Eye Res. 2024 Nov;248:110073. doi: 10.1016/j.exer.2024.110073. Epub 2024 Sep 5.
4
RNA-Seq Analysis Unraveling Novel Genes and Pathways Influencing Corneal Wound Healing.
Invest Ophthalmol Vis Sci. 2024 Sep 3;65(11):13. doi: 10.1167/iovs.65.11.13.
5
Transcriptional Modulation during Photomorphogenesis in Rice Seedlings.
Genes (Basel). 2024 Aug 14;15(8):1072. doi: 10.3390/genes15081072.
6
A safety framework for flow decomposition problems via integer linear programming.
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.
7
StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads.
Genes Genomics. 2023 Dec;45(12):1599-1609. doi: 10.1007/s13258-023-01458-7. Epub 2023 Oct 14.
8
RNA Transcript Diversity in Neuromuscular Research.
J Neuromuscul Dis. 2023;10(4):473-482. doi: 10.3233/JND-221601.
10
Strain level microbial detection and quantification with applications to single cell metagenomics.
Nat Commun. 2022 Oct 28;13(1):6430. doi: 10.1038/s41467-022-33869-7.

本文引用的文献

1
Accurate estimation of expression levels of homologous genes in RNA-seq experiments.
J Comput Biol. 2011 Mar;18(3):459-68. doi: 10.1089/cmb.2010.0259.
2
Inference of isoforms from short sequence reads.
J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.
3
Accurate quantification of transcriptome from RNA-Seq data by effective length normalization.
Nucleic Acids Res. 2011 Jan;39(2):e9. doi: 10.1093/nar/gkq1015. Epub 2010 Nov 8.
4
A two-parameter generalized Poisson model to improve the analysis of RNA-seq data.
Nucleic Acids Res. 2010 Sep;38(17):e170. doi: 10.1093/nar/gkq670. Epub 2010 Jul 29.
5
Modeling non-uniformity in short-read rates in RNA-Seq data.
Genome Biol. 2010;11(5):R50. doi: 10.1186/gb-2010-11-5-r50. Epub 2010 May 11.
6
Advancing RNA-Seq analysis.
Nat Biotechnol. 2010 May;28(5):421-3. doi: 10.1038/nbt0510-421.
9
Detection of splice junctions from paired-end RNA-seq data by SpliceMap.
Nucleic Acids Res. 2010 Aug;38(14):4570-8. doi: 10.1093/nar/gkq211. Epub 2010 Apr 5.
10
Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments.
Nucleic Acids Res. 2010 Jun;38(10):e112. doi: 10.1093/nar/gkq041. Epub 2010 Feb 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验