• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于EST分析预测的贝叶斯非参数方法。

A Bayesian nonparametric method for prediction in EST analysis.

作者信息

Lijoi Antonio, Mena Ramsés H, Prünster Igor

机构信息

Department of Economics and Quantitative Methods, University of Pavia, 27100 Pavia and Institute for Applied Mathematics and Information Technology, National Research Council, 20133 Milan, Italy.

出版信息

BMC Bioinformatics. 2007 Sep 14;8:339. doi: 10.1186/1471-2105-8-339.

DOI:10.1186/1471-2105-8-339
PMID:17868445
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2220008/
Abstract

BACKGROUND

Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library.

RESULTS

In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail.

CONCLUSION

The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.

摘要

背景

表达序列标签(EST)分析是生物中基因识别的基本工具。给定来自某个文库的初步EST样本,会出现几个统计预测问题。特别地,估计在给定大小的未来EST样本中可以检测到多少新基因以及确定基因发现率是很有意义的:这些估计是决定是否继续对文库进行测序的基础,并且在做出肯定决定的情况下,是选择新样本大小的指导方针。此类信息对于在实验设计中确定测序效率以及测量EST文库的冗余程度也很有用。

结果

在这项工作中,我们提出了一种贝叶斯非参数方法来解决与EST调查相关的统计问题。特别地,我们提供了以下估计:a)覆盖率,定义为给定读段样本中所代表的文库中独特基因的比例;b)在未来样本中要观察到的新独特基因的数量;c)作为未来样本大小函数的新基因发现率。我们采用的贝叶斯非参数模型以统计严格的方式将可用信息纳入预测。我们的提议相对于频率主义非参数方法具有吸引人的特性,当需要对大型未来样本进行预测时,频率主义非参数方法会变得不稳定。以前用频率主义方法研究过的EST文库被详细分析。

结论

我们采用的贝叶斯非参数方法为EST文库中的基因捕获和预测提供了有价值的工具。我们获得的估计量没有与频率主义估计量相关的那种缺点,并且对于任何大小的额外样本都是可靠的。

相似文献

1
A Bayesian nonparametric method for prediction in EST analysis.一种用于EST分析预测的贝叶斯非参数方法。
BMC Bioinformatics. 2007 Sep 14;8:339. doi: 10.1186/1471-2105-8-339.
2
Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys.在EST调查中估计并比较基因发现率和表达序列标签(EST)频率。
Bioinformatics. 2004 Sep 22;20(14):2279-87. doi: 10.1093/bioinformatics/bth239. Epub 2004 Apr 1.
3
Statistical analysis of expressed sequence tags.表达序列标签的统计分析
Methods Mol Biol. 2009;533:277-87. doi: 10.1007/978-1-60327-136-3_13.
4
Statistical modeling of sequencing errors in SAGE libraries.SAGE文库中测序错误的统计建模
Bioinformatics. 2004 Aug 4;20 Suppl 1:i31-9. doi: 10.1093/bioinformatics/bth924.
5
Gene identification through large-scale EST sequence processing.通过大规模EST序列处理进行基因鉴定。
Appl Bioinformatics. 2003;2(3):123-9.
6
Bayesian decision-theoretic group sequential clinical trial design based on a quadratic loss function: a frequentist evaluation.基于二次损失函数的贝叶斯决策理论组序贯临床试验设计:频率学派评估
Clin Trials. 2007;4(1):5-14. doi: 10.1177/1740774506075764.
7
RBR: library-less repeat detection for ESTs.RBR:用于ESTs的无文库重复序列检测
Bioinformatics. 2006 Sep 15;22(18):2232-6. doi: 10.1093/bioinformatics/btl368. Epub 2006 Jul 12.
8
A Bayesian nonparametric approach for comparing clustering structures in EST libraries.一种用于比较EST文库中聚类结构的贝叶斯非参数方法。
J Comput Biol. 2008 Dec;15(10):1315-27. doi: 10.1089/cmb.2008.0043.
9
A new estimator of the discovery probability.发现概率的一种新估计量。
Biometrics. 2012 Dec;68(4):1188-96. doi: 10.1111/j.1541-0420.2012.01793.x. Epub 2012 Oct 1.
10
Bayesian sample size determination for the accurate identification of the bacterial subtypes.用于准确鉴定细菌亚型的贝叶斯样本量确定
East Afr J Public Health. 2009 Apr;6 Suppl(1):37-8.

引用本文的文献

1
A Bayesian Semi-parametric Approach for the Differential Analysis of Sequence Counts Data.一种用于序列计数数据差异分析的贝叶斯半参数方法。
J R Stat Soc Ser C Appl Stat. 2014 Apr;63(3):385-404. doi: 10.1111/rssc.12041.
2
RICHEST--a web server for richness estimation in biological data.RICHEST——一个用于生物数据丰富度估计的网络服务器。
Bioinformation. 2009;3(7):296-8. doi: 10.6026/97320630003296. Epub 2009 Feb 27.
3
A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers.

本文引用的文献

1
Gene discovery and annotation using LCM-454 transcriptome sequencing.利用激光捕获显微切割-454转录组测序进行基因发现与注释
Genome Res. 2007 Jan;17(1):69-73. doi: 10.1101/gr.5145806. Epub 2006 Nov 9.
2
Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类
Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.
3
Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries.
一个用于开发简单序列重复标记的黑莓(悬钩子属)表达序列标签文库。
BMC Plant Biol. 2008 Jun 20;8:69. doi: 10.1186/1471-2229-8-69.
来自一个或多个文库的EST测序中的基因捕获预测与重叠估计。
BMC Bioinformatics. 2005 Dec 13;6:300. doi: 10.1186/1471-2105-6-300.
4
EST clustering error evaluation and correction.EST聚类错误评估与校正。
Bioinformatics. 2004 Nov 22;20(17):2973-84. doi: 10.1093/bioinformatics/bth342. Epub 2004 Jun 9.
5
Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys.在EST调查中估计并比较基因发现率和表达序列标签(EST)频率。
Bioinformatics. 2004 Sep 22;20(14):2279-87. doi: 10.1093/bioinformatics/bth239. Epub 2004 Apr 1.
6
The sampling theory of selectively neutral alleles.选择性中性等位基因的抽样理论
Theor Popul Biol. 1972 Mar;3(1):87-112. doi: 10.1016/0040-5809(72)90035-4.
7
Complementary DNA sequencing: expressed sequence tags and human genome project.互补DNA测序:表达序列标签与人类基因组计划
Science. 1991 Jun 21;252(5013):1651-6. doi: 10.1126/science.2047873.