• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于寻找非编码RNA基因的支持向量数据描述

[Support vector data description for finding non-coding RNA gene].

作者信息

Zhao Yingjie, Wang Zhengzhi

机构信息

College of Mechatronics Engineering and Automation, National University of Defense Technology, Changsha 410073, China.

出版信息

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2010 Aug;27(4):779-84.

PMID:20842844
Abstract

In the field of computational molecule biology, there is still a challenging question of how to detect non-coding RNA gene in lots of unlabeled sequences. Generally, the methods of machine learning and classification are employed to answer this question. However, only a limited number of positive training samples and unlabeled samples are available. The negative samples are difficult to define appropriately, yet they are necessary for usual learning-then-classification method. The common way for most of the existing non-coding RNA gene finding methods is to produce a number of random sequences as negative samples, which may hold some characteristic of positive sample sequences. Consequently, the contrived uncertain factor was introduced and the performance of methods was not good enough. In this paper, Support Vector Data Description (SVDD) is in use for to learning and classification as well as for detecting non-coding RNA gene in lots of unlabeled sequences, and the k-means clustering algorithm is employed before SVDD training to deal with the high flase positive fault in the result of SVDD. The training samples (target samples) are non-coding RNA genes validated by experiment. Moreover, appropriate features were constructed by Principal Component Analysis (PCA). The effectiveness and performance of the method are demonstrated by testing the cases in NONCODE databases and E. coli genome.

摘要

在计算分子生物学领域,如何在大量未标记序列中检测非编码RNA基因仍是一个具有挑战性的问题。一般来说,机器学习和分类方法被用于回答这个问题。然而,只有数量有限的阳性训练样本和未标记样本可用。阴性样本难以恰当定义,但它们对于常用的先学习后分类方法是必要的。大多数现有非编码RNA基因发现方法的常见做法是生成一些随机序列作为阴性样本,这些随机序列可能具有阳性样本序列的某些特征。因此,引入了人为的不确定因素,方法的性能不够理想。在本文中,支持向量数据描述(SVDD)被用于学习和分类,以及在大量未标记序列中检测非编码RNA基因,并且在SVDD训练之前采用k均值聚类算法来处理SVDD结果中的高误报故障。训练样本(目标样本)是经过实验验证的非编码RNA基因。此外,通过主成分分析(PCA)构建了合适的特征。通过对NONCODE数据库和大肠杆菌基因组中的案例进行测试,证明了该方法的有效性和性能。

相似文献

1
[Support vector data description for finding non-coding RNA gene].用于寻找非编码RNA基因的支持向量数据描述
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2010 Aug;27(4):779-84.
2
PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.PSoL:一种用于寻找非编码RNA基因的仅正样本学习算法。
Bioinformatics. 2006 Nov 1;22(21):2590-6. doi: 10.1093/bioinformatics/btl441. Epub 2006 Aug 31.
3
Theoretical analysis for solution of support vector data description.支持向量数据描述的解的理论分析。
Neural Netw. 2011 May;24(4):360-9. doi: 10.1016/j.neunet.2011.01.007. Epub 2011 Feb 3.
4
Diagnostic pattern recognition on gene-expression profile data by using one-class classification.
J Chem Inf Model. 2005 Sep-Oct;45(5):1392-401. doi: 10.1021/ci049726v.
5
[Support vector machine based high intensity focused ultrasound beam lesion degree classification and recognition].
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2010 Oct;27(5):978-83.
6
Laplacian twin support vector machine for semi-supervised classification.拉普拉斯孪生支持向量机的半监督分类。
Neural Netw. 2012 Nov;35:46-53. doi: 10.1016/j.neunet.2012.07.011. Epub 2012 Aug 10.
7
Unsupervised active learning based on hierarchical graph-theoretic clustering.基于层次图论聚类的无监督主动学习
IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1147-61. doi: 10.1109/TSMCB.2009.2013197. Epub 2009 Mar 24.
8
Identification of coding and non-coding sequences using local Holder exponent formalism.使用局部赫尔德指数形式主义识别编码和非编码序列。
Bioinformatics. 2005 Oct 15;21(20):3818-23. doi: 10.1093/bioinformatics/bti639. Epub 2005 Aug 23.
9
SemiBoost: boosting for semi-supervised learning.半增强算法:用于半监督学习的增强算法
IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.
10
Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.使用非负主成分分析改进基因表达癌症分子模式发现
Genome Inform. 2008;21:200-11.