• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于非同源性预测玉米(Zea mays ssp. mays)中的基因功能。

Non-homology-based prediction of gene functions in maize (Zea mays ssp. mays).

机构信息

State Key Laboratory of Crop Biology, Shandong Agricultural University, Taian, 273100, China.

Quantitative Life Sciences Initiative, Center for Plant Science Innovation, and Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.

出版信息

Plant Genome. 2020 Jul;13(2):e20015. doi: 10.1002/tpg2.20015. Epub 2020 Apr 29.

DOI:10.1002/tpg2.20015
PMID:33016608
Abstract

Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non-homology gene features. Among the eight supervised classification algorithms evaluated, random-forest-based prediction consistently provided the most accurate gene function prediction. Non-homology-based functional annotation provides complementary strengths to homology-based annotation, with higher average performance in Biological Process GO terms, the domain where homology-based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology-based functional annotation is highest. GO prediction models trained with homology-based annotations were able to successfully predict annotations from a manually curated "gold standard" GO annotation set. Non-homology-based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology-based functional annotations.

摘要

基因组测序和注释的进展降低了识别新基因序列的难度。预测这些新鉴定基因的功能仍然具有挑战性。来自共同祖先序列的基因很可能具有共同的功能。因此,同源性被广泛用于基因功能预测。这意味着功能注释错误也会从一个物种传播到另一个物种。评估了几种基于机器学习分类算法的方法,以评估它们从非同源基因特征准确预测基因功能的能力。在所评估的八种有监督分类算法中,基于随机森林的预测方法始终提供最准确的基因功能预测。基于非同源性的功能注释为基于同源性的注释提供了互补优势,在基于同源性的功能注释表现最差的生物过程 GO 术语中具有更高的平均性能,而在基于同源性的功能注释准确性最高的分子功能 GO 术语中性能较弱。基于同源性注释训练的 GO 预测模型能够成功地预测来自手动整理的“黄金标准”GO 注释集的注释。基于机器学习的基于非同源性的功能注释最终可能被证明是有用的,既可以将预测的功能分配给缺乏功能特征同源物的孤儿基因,也可以识别和纠正通过基于同源性的功能注释传播的功能注释错误。

相似文献

1
Non-homology-based prediction of gene functions in maize (Zea mays ssp. mays).基于非同源性预测玉米(Zea mays ssp. mays)中的基因功能。
Plant Genome. 2020 Jul;13(2):e20015. doi: 10.1002/tpg2.20015. Epub 2020 Apr 29.
2
3
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。
Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.
4
TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome.TIR-Learner,一种新的 TIR 转座元件注释集成方法,为玉米基因组中丰富的新转座元件提供了证据。
Mol Plant. 2019 Mar 4;12(3):447-460. doi: 10.1016/j.molp.2019.02.008. Epub 2019 Feb 23.
5
Predicting functions of maize proteins using graph convolutional network.利用图卷积网络预测玉米蛋白的功能。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.
6
Cross-organism learning method to discover new gene functionalities.跨生物学习方法发现新基因功能。
Comput Methods Programs Biomed. 2016 Apr;126:20-34. doi: 10.1016/j.cmpb.2015.12.002. Epub 2015 Dec 17.
7
Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.基于同源性和非同源性的计算方法在孤儿酶的鉴定和注释中的应用:以结核分枝杆菌 H37Rv 为例。
BMC Bioinformatics. 2020 Oct 19;21(1):466. doi: 10.1186/s12859-020-03794-x.
8
Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations.新型指标提高预测基因本体论注释的优先级。
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):954-965. doi: 10.1109/TCBB.2017.2695459. Epub 2017 Apr 18.
9
Gene function finding through cross-organism ensemble learning.通过跨物种集成学习进行基因功能发现。
BioData Min. 2021 Feb 12;14(1):14. doi: 10.1186/s13040-021-00239-w.
10
Harnessing the predicted maize pan-interactome for putative gene function prediction and prioritization of candidate genes for important traits.利用预测的玉米泛互作组进行假定基因功能预测,并对重要性状的候选基因进行优先级排序。
G3 (Bethesda). 2024 May 7;14(5). doi: 10.1093/g3journal/jkae059.

引用本文的文献

1
SaGP: identifying plant saline-alkali tolerance genes based on machine learning techniques.SaGP:基于机器学习技术鉴定植物耐盐碱基因
Front Plant Sci. 2025 Jul 16;16:1629794. doi: 10.3389/fpls.2025.1629794. eCollection 2025.
2
Genome-wide and transcriptome analysis of PdWRKY transcription factors in date palm (Phoenix dactylifera) revealing insights into heat and drought stress tolerance.海枣(Phoenix dactylifera)中PdWRKY转录因子的全基因组和转录组分析揭示了对耐热和耐旱性的见解。
BMC Genomics. 2025 Jul 1;26(1):589. doi: 10.1186/s12864-025-11715-6.
3
DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences.
DHFS-ECM:基于双重启发式特征选择的集成分类模型设计,用于从基因组序列中识别竹种
Curr Genomics. 2024 May 31;25(3):185-201. doi: 10.2174/0113892029268176240125055419. Epub 2024 Feb 1.
4
Maize Feature Store: A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications.玉米特征库:一个集中式资源,用于管理和分析经过策展的玉米多组学特征,以用于机器学习应用。
Database (Oxford). 2023 Nov 6;2023. doi: 10.1093/database/baad078.
5
Dynamic interplay of , , and transcription factor families in tomato-endophytic fungal symbiosis: insights from transcriptome and genome-wide analysis.番茄与内生真菌共生过程中, 、 和 转录因子家族的动态相互作用:转录组和全基因组分析的见解
Front Plant Sci. 2023 Jun 5;14:1181227. doi: 10.3389/fpls.2023.1181227. eCollection 2023.
6
Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective.基于统计视角的基因组序列处理模型启发式分析以实现高效预测
Curr Genomics. 2022 Nov 18;23(5):299-317. doi: 10.2174/1389202923666220927105311.
7
Genome-wide annotation and expression analysis of and transcriptional factor families reveal their involvement under cadmium stress in tomato ( L.).番茄中[具体转录因子家族名称1]和[具体转录因子家族名称2]转录因子家族的全基因组注释与表达分析揭示了它们在镉胁迫下的参与情况。
Front Plant Sci. 2023 Jan 25;14:1100895. doi: 10.3389/fpls.2023.1100895. eCollection 2023.
8
PGD: a machine learning-based photosynthetic-related gene detection approach.PGD:一种基于机器学习的光合作用相关基因检测方法。
BMC Bioinformatics. 2022 May 17;23(1):183. doi: 10.1186/s12859-022-04722-x.
9
FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences.FINDER:一个自动化软件包,用于从 RNA-Seq 数据和相关蛋白质序列中注释真核基因。
BMC Bioinformatics. 2021 Apr 20;22(1):205. doi: 10.1186/s12859-021-04120-9.
10
Predicting transcriptional responses to cold stress across plant species.预测植物物种对冷应激的转录反应。
Proc Natl Acad Sci U S A. 2021 Mar 9;118(10). doi: 10.1073/pnas.2026330118.