• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大豆mRNA簇的评估。

Evaluation of Glycine max mRNA clusters.

作者信息

Frank Ronald L, Ercal Fikret

机构信息

Biological Sciences Department, University of Missouri-Rolla, Rolla, MO, USA.

出版信息

BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2105-6-S2-S7.

DOI:10.1186/1471-2105-6-S2-S7
PMID:16026604
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1637028/
Abstract

BACKGROUND

Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not.

RESULTS

Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters.

CONCLUSION

Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences.

摘要

背景

将来自代表单一物种的大型数据集的EST(表达序列标签)进行聚类,是对基因发现、基因组进化、表达模式和可变剪接转录本进行多项研究的便利起点。已经开发了几种方法来完成这一任务,其中最广泛使用的是UniGene,它是由美国国立医学图书馆(NCBI)创建和维护的针对45种以上不同物种的面向基因的聚类公共数据库。目标是每个聚类代表一个独特的基因,但目前尚不清楚总体结果在多大程度上反映了这一现实。UniGene的构建过程在加入EST之前先从初始mRNA聚类开始。UniGene对大豆的结果表明,一些据报道为独特mRNA的序列之间存在大量冗余。为了建立一个有效的大豆已知基因非冗余集,我们将我们的算法应用于仅mRNA序列的聚类。mRNA数据集使用两种不同的匹配严格度运行该算法。将得到的聚类组成相互比较,并与UniGene进行比较。通过1)核苷酸和氨基酸比对以及2)提交作者的结论来分析在这三种方法之间表现出差异的聚类,以确定单个聚类的成员是否代表相同的基因。

结果

在仔细检查的12个聚类中,大多数都包含不属于同一聚类的序列示例。然而,PECT的两种严格度和UniGene在将旁系同源物放入单独聚类方面都没有显著更高的准确性记录。

结论

我们的结果表明,尽管每种方法都会产生一些错误,但使用多种严格度进行匹配或采用严格度递增的顺序分层方法可以提供更可靠的结果,因此可以对绝大多数仅包含EST而不包含mRNA序列的聚类更有信心。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/3aa177b711f9/1471-2105-6-S2-S7-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/748929fda108/1471-2105-6-S2-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/edcf8493ca98/1471-2105-6-S2-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/fb526698c553/1471-2105-6-S2-S7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/afea244a60d4/1471-2105-6-S2-S7-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/3aa177b711f9/1471-2105-6-S2-S7-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/748929fda108/1471-2105-6-S2-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/edcf8493ca98/1471-2105-6-S2-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/fb526698c553/1471-2105-6-S2-S7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/afea244a60d4/1471-2105-6-S2-S7-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b1/1637028/3aa177b711f9/1471-2105-6-S2-S7-5.jpg

相似文献

1
Evaluation of Glycine max mRNA clusters.大豆mRNA簇的评估。
BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2105-6-S2-S7.
2
Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant.利用一组27500个低冗余测序cDNA构建的用于全局表达的微阵列,这些cDNA代表了大豆植株一系列发育阶段和生理状况。
BMC Genomics. 2004 Sep 29;5:73. doi: 10.1186/1471-2164-5-73.
3
Evaluation of EST-data using the genome assembly.利用基因组组装对EST数据进行评估。
Biochem Biophys Res Commun. 2005 Jun 17;331(4):1566-76. doi: 10.1016/j.bbrc.2005.04.070.
4
Identification and analysis of gene families from the duplicated genome of soybean using EST sequences.利用EST序列对大豆重复基因组中的基因家族进行鉴定与分析。
BMC Genomics. 2006 Aug 9;7:204. doi: 10.1186/1471-2164-7-204.
5
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
6
Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information.利用表达序列标签对基因组信息有限的生物体进行最佳cDNA微阵列设计。
BMC Bioinformatics. 2004 Dec 7;5:191. doi: 10.1186/1471-2105-5-191.
7
d2_cluster: a validated method for clustering EST and full-length cDNAsequences.d2聚类:一种用于对EST和全长cDNA序列进行聚类的有效方法。
Genome Res. 1999 Nov;9(11):1135-42. doi: 10.1101/gr.9.11.1135.
8
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.对微生物基因组中的蛋白质进行多分辨率水平的聚类分析。
BMC Bioinformatics. 2016 Aug 31;17 Suppl 8(Suppl 8):276. doi: 10.1186/s12859-016-1112-8.
9
Parallel hash-based EST clustering algorithm for gene sequencing.用于基因测序的基于哈希的并行EST聚类算法
DNA Cell Biol. 2004 Oct;23(10):615-23. doi: 10.1089/dna.2004.23.615.
10
A sequence based synteny map between soybean and Arabidopsis thaliana.大豆与拟南芥之间基于序列的共线性图谱。
BMC Genomics. 2007 Jan 8;8:8. doi: 10.1186/1471-2164-8-8.

引用本文的文献

1
Validation of an NSP-based (negative selection pattern) gene family identification strategy.基于负选择模式(NSP)的基因家族鉴定策略的验证
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-9-S9-S2.
2
An automated method for rapid identification of putative gene family members in plants.一种用于快速鉴定植物中假定基因家族成员的自动化方法。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S19. doi: 10.1186/1471-2105-7-S2-S19.
3
Proceedings of the Third Annual Conference of the MidSouth Computational Biology and Bioinformatics Society. Introduction.

本文引用的文献

1
Soybean Nodule-Specific Uricase (Nodulin-35) Is Expressed and Assembled into a Functional Tetrameric Holoenzyme in Escherichia coli.大豆根瘤特异性尿酸酶(根瘤蛋白-35)在大肠杆菌中表达并组装成功能性四聚体全酶。
Plant Physiol. 1991 Feb;95(2):384-9. doi: 10.1104/pp.95.2.384.
2
Primary structure of the soybean nodulin-35 gene encoding uricase II localized in the peroxisomes of uninfected cells of nodules.大豆尿囊素 35 基因编码尿酸酶 II 的一级结构,该酶定位于根瘤细胞的过氧化物酶体中。
Proc Natl Acad Sci U S A. 1985 Aug;82(15):5040-4. doi: 10.1073/pnas.82.15.5040.
3
Parallel hash-based EST clustering algorithm for gene sequencing.
第三届中南计算生物学与生物信息学学会年会会议记录。引言。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-7-S2-S1.
4
Proceedings of the second annual conference of the MidSouth Computational Biology and Bioinformatics Society. 7-9 October 2004, Little Rock, Arkansas, USA.美国阿肯色州小石城,2004年10月7日至9日,中南计算生物学与生物信息学学会第二届年会会议记录。
BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S1-13. doi: 10.1186/1471-2105-6-S2-S1.
用于基因测序的基于哈希的并行EST聚类算法
DNA Cell Biol. 2004 Oct;23(10):615-23. doi: 10.1089/dna.2004.23.615.
4
Roots, cycles and leaves. Expression of the phosphoenolpyruvate carboxylase kinase gene family in soybean.根、循环与叶片。大豆中磷酸烯醇式丙酮酸羧化酶激酶基因家族的表达
Plant Physiol. 2004 Aug;135(4):2078-87. doi: 10.1104/pp.104.042762. Epub 2004 Aug 6.
5
Interallelic complementation at the ubiquitous urease coding locus of soybean.大豆普遍存在的脲酶编码基因座的等位基因间互补作用
Plant Physiol. 2003 Aug;132(4):1801-10. doi: 10.1104/pp.103.022699.
6
Identification and expression of a soybean nodule-enhanced PEP-carboxylase kinase gene (NE-PpcK) that shows striking up-/down-regulation in vivo.一个在体内表现出显著上调/下调的大豆根瘤增强型磷酸烯醇式丙酮酸羧化酶激酶基因(NE-PpcK)的鉴定与表达。
Plant J. 2003 May;34(4):441-52. doi: 10.1046/j.1365-313x.2003.01740.x.
7
Identification, structure, and differential expression of members of a BURP domain containing protein family in soybean.大豆中一个含BURP结构域蛋白家族成员的鉴定、结构及差异表达
Genome. 2002 Aug;45(4):693-701. doi: 10.1139/g02-032.
8
PALS db: Putative Alternative Splicing database.PALS数据库:推定可变剪接数据库。
Nucleic Acids Res. 2002 Jan 1;30(1):186-90. doi: 10.1093/nar/30.1.186.
9
The human gene for gammaS-crystallin: alternative transcripts and expressed sequences from the first intron.γS-晶体蛋白的人类基因:来自第一个内含子的可变转录本和表达序列。
Mol Vis. 2000 May 17;6:79-84.
10
d2_cluster: a validated method for clustering EST and full-length cDNAsequences.d2聚类:一种用于对EST和全长cDNA序列进行聚类的有效方法。
Genome Res. 1999 Nov;9(11):1135-42. doi: 10.1101/gr.9.11.1135.