• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于逻辑回归的剪接位点预测任务的域适应分类器研究

A Study of Domain Adaptation Classifiers Derived From Logistic Regression for the Task of Splice Site Prediction.

作者信息

Herndon Nic, Caragea Doina

出版信息

IEEE Trans Nanobioscience. 2016 Mar;15(2):75-83. doi: 10.1109/TNB.2016.2522400. Epub 2016 Jan 28.

DOI:10.1109/TNB.2016.2522400
PMID:26849871
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4894847/
Abstract

Supervised classifiers are highly dependent on abundant labeled training data. Alternatives for addressing the lack of labeled data include: labeling data (but this is costly and time consuming); training classifiers with abundant data from another domain (however, the classification accuracy usually decreases as the distance between domains increases); or complementing the limited labeled data with abundant unlabeled data from the same domain and learning semi-supervised classifiers (but the unlabeled data can mislead the classifier). A better alternative is to use both the abundant labeled data from a source domain, the limited labeled data and optionally the unlabeled data from the target domain to train classifiers in a domain adaptation setting. We propose two such classifiers, based on logistic regression, and evaluate them for the task of splice site prediction-a difficult and essential step in gene prediction. Our classifiers achieved high accuracy, with highest areas under the precision-recall curve between 50.83% and 82.61%.

摘要

监督式分类器高度依赖大量带标签的训练数据。解决标签数据缺乏问题的替代方法包括:标记数据(但这成本高且耗时);使用来自另一个领域的大量数据训练分类器(然而,随着领域间距离增加,分类准确率通常会降低);或者用来自同一领域的大量未标记数据补充有限的标记数据并学习半监督分类器(但未标记数据可能会误导分类器)。更好的替代方法是在域适应设置中,使用来自源域的大量标记数据、目标域的有限标记数据以及可选的未标记数据来训练分类器。我们基于逻辑回归提出了两个这样的分类器,并针对剪接位点预测任务对它们进行评估——剪接位点预测是基因预测中一个困难且关键的步骤。我们的分类器取得了很高的准确率,精确率-召回率曲线下的最高面积在50.83%至82.61%之间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a8/4894847/79d23a120d6b/nihms787617f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a8/4894847/b6cd49003702/nihms787617f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a8/4894847/79d23a120d6b/nihms787617f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a8/4894847/b6cd49003702/nihms787617f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a8/4894847/79d23a120d6b/nihms787617f2.jpg

相似文献

1
A Study of Domain Adaptation Classifiers Derived From Logistic Regression for the Task of Splice Site Prediction.基于逻辑回归的剪接位点预测任务的域适应分类器研究
IEEE Trans Nanobioscience. 2016 Mar;15(2):75-83. doi: 10.1109/TNB.2016.2522400. Epub 2016 Jan 28.
2
A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data.一种使用短序列基序且无需编码序列数据来预测5'剪接位点的统计方法。
BMC Bioinformatics. 2014 Nov 25;15:362. doi: 10.1186/s12859-014-0362-6.
3
Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition.评估序列编码方案和机器学习方法在剪接位点识别中的性能。
Gene. 2019 Jul 15;705:113-126. doi: 10.1016/j.gene.2019.04.047. Epub 2019 Apr 19.
4
An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.针对不平衡剪接位点数据集的基于集成的半监督学习方法的实证研究。
BMC Syst Biol. 2015;9 Suppl 5(Suppl 5):S1. doi: 10.1186/1752-0509-9-S5-S1. Epub 2015 Sep 1.
5
AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.AucPR:一种基于AUC的方法,使用惩罚回归对高维组学数据进行疾病预测。
BMC Genomics. 2014;15 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2164-15-S10-S1. Epub 2014 Dec 12.
6
An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction.一种从序列数据中生成特征的进化算法方法及其在 DNA 剪接位点预测中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1387-98. doi: 10.1109/TCBB.2012.53.
7
In vivo and In vitro methods to identify DNA sequence variants that alter RNA Splicing.用于鉴定改变RNA剪接的DNA序列变异体的体内和体外方法。
Curr Protoc Hum Genet. 2018 Apr;97(1):e60. doi: 10.1002/cphg.60. Epub 2018 Apr 26.
8
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
9
Estimating classification accuracy in positive-unlabeled learning: characterization and correction strategies.估计正例未标记学习中的分类准确率:特征描述与校正策略。
Pac Symp Biocomput. 2019;24:124-135.
10
A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction.一种通过普罗克汝斯分析和均值漂移进行癌症药物敏感性预测的迁移学习方法。
J Bioinform Comput Biol. 2018 Jun;16(3):1840014. doi: 10.1142/S0219720018400140.

本文引用的文献

1
Assessment of transcript reconstruction methods for RNA-seq.RNA-seq 转录本重构方法评估。
Nat Methods. 2013 Dec;10(12):1177-84. doi: 10.1038/nmeth.2714. Epub 2013 Nov 3.
2
High-accuracy splice site prediction based on sequence component and position features.基于序列成分和位置特征的高精度剪接位点预测
Genet Mol Res. 2012 Sep 25;11(3):3432-51. doi: 10.4238/2012.September.25.12.
3
Accurate splice site prediction using support vector machines.使用支持向量机进行精确的剪接位点预测。
BMC Bioinformatics. 2007;8 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2105-8-S10-S7.
4
An introduction to kernel-based learning algorithms.基于核的学习算法介绍。
IEEE Trans Neural Netw. 2001;12(2):181-201. doi: 10.1109/72.914517.
5
CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction.对比法:一种用于多信息源从头基因预测的无系统发育的判别方法。
Genome Biol. 2007;8(12):R269. doi: 10.1186/gb-2007-8-12-r269.
6
Global discriminative learning for higher-accuracy computational gene prediction.用于更高精度计算基因预测的全局判别学习
PLoS Comput Biol. 2007 Mar 16;3(3):e54. doi: 10.1371/journal.pcbi.0030054. Epub 2007 Feb 2.
7
Splice site identification using probabilistic parameters and SVM classification.使用概率参数和支持向量机分类进行剪接位点识别。
BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S15. doi: 10.1186/1471-2105-7-S5-S15.
8
What is a support vector machine?什么是支持向量机?
Nat Biotechnol. 2006 Dec;24(12):1565-7. doi: 10.1038/nbt1206-1565.
9
Gene prediction with a hidden Markov model and a new intron submodel.基于隐马尔可夫模型和新型内含子子模型的基因预测
Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. doi: 10.1093/bioinformatics/btg1080.
10
Modeling splicing sites with pairwise correlations.使用成对相关性对剪接位点进行建模。
Bioinformatics. 2002;18 Suppl 2:S27-34. doi: 10.1093/bioinformatics/18.suppl_2.s27.