• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于卷积神经网络+Transformer 模型的毛竹(Phyllostachys edulis)孤儿基因识别的深度学习方法。

A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model.

机构信息

Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture University, Hefei, 230001, China.

College of Information and Computer Science, Anhui Agricultural University, Hefei, 230001, China.

出版信息

BMC Bioinformatics. 2022 May 5;23(1):162. doi: 10.1186/s12859-022-04702-1.

DOI:10.1186/s12859-022-04702-1
PMID:35513802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9069780/
Abstract

BACKGROUND

Orphan gene play an important role in the environmental stresses of many species and their identification is a critical step to understand biological functions. Moso bamboo has high ecological, economic and cultural value. Studies have shown that the growth of moso bamboo is influenced by various stresses. Several traditional methods are time-consuming and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting orphan genes is of great significance.

RESULTS

In this paper, we propose a novel deep learning model (CNN + Transformer) for identifying orphan genes in moso bamboo. It uses a convolutional neural network in combination with a transformer neural network to capture k-mer amino acids and features between k-mer amino acids in protein sequences. The experimental results show that the average balance accuracy value of CNN + Transformer on moso bamboo dataset can reach 0.875, and the average Matthews Correlation Coefficient (MCC) value can reach 0.471. For the same testing set, the Balance Accuracy (BA), Geometric Mean (GM), Bookmaker Informedness (BM), and MCC values of the recurrent neural network, long short-term memory, gated recurrent unit, and transformer models are all lower than those of CNN + Transformer, which indicated that the model has the extensive ability for OG identification in moso bamboo.

CONCLUSIONS

CNN + Transformer model is feasible and obtains the credible predictive results. It may also provide valuable references for other related research. As our knowledge, this is the first model to adopt the deep learning techniques for identifying orphan genes in plants.

摘要

背景

孤儿基因在许多物种的环境胁迫中起着重要作用,它们的鉴定是了解生物功能的关键步骤。毛竹具有很高的生态、经济和文化价值。研究表明,毛竹的生长受到各种胁迫的影响。几种传统方法既耗时又低效。因此,开发高效、高精度的计算方法来预测孤儿基因具有重要意义。

结果

本文提出了一种新的深度学习模型(CNN+Transformer),用于识别毛竹中的孤儿基因。它使用卷积神经网络与 Transformer 神经网络相结合,来捕获蛋白质序列中 k-mer 氨基酸和 k-氨基酸之间的特征。实验结果表明,CNN+Transformer 在毛竹数据集上的平均平衡准确率值可达 0.875,平均马修斯相关系数(MCC)值可达 0.471。对于相同的测试集,递归神经网络、长短期记忆、门控循环单元和 Transformer 模型的平衡准确率(BA)、几何平均值(GM)、博彩商信息量(BM)和 MCC 值均低于 CNN+Transformer,这表明该模型具有广泛的毛竹 OG 识别能力。

结论

CNN+Transformer 模型是可行的,并获得了可靠的预测结果。它也可能为其他相关研究提供有价值的参考。据我们所知,这是第一个采用深度学习技术来识别植物孤儿基因的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/5e9995f0a1b9/12859_2022_4702_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/002596dfd021/12859_2022_4702_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/a1f6b3501e3c/12859_2022_4702_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/5a468b46ff8c/12859_2022_4702_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/e38611407fd8/12859_2022_4702_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/39675ef4c0c2/12859_2022_4702_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/f8e7d072cb1d/12859_2022_4702_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/665ed6bbcbe2/12859_2022_4702_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/5e9995f0a1b9/12859_2022_4702_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/002596dfd021/12859_2022_4702_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/a1f6b3501e3c/12859_2022_4702_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/5a468b46ff8c/12859_2022_4702_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/e38611407fd8/12859_2022_4702_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/39675ef4c0c2/12859_2022_4702_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/f8e7d072cb1d/12859_2022_4702_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/665ed6bbcbe2/12859_2022_4702_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/154e/9069780/5e9995f0a1b9/12859_2022_4702_Fig8_HTML.jpg

相似文献

1
A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model.基于卷积神经网络+Transformer 模型的毛竹(Phyllostachys edulis)孤儿基因识别的深度学习方法。
BMC Bioinformatics. 2022 May 5;23(1):162. doi: 10.1186/s12859-022-04702-1.
2
Genome-wide identification and expression analysis of LBD transcription factor genes in Moso bamboo (Phyllostachys edulis).毛竹(Phyllostachys edulis)LBD 转录因子基因的全基因组鉴定和表达分析。
BMC Plant Biol. 2021 Jun 28;21(1):296. doi: 10.1186/s12870-021-03078-3.
3
Genome-wide identification and expression analysis of SBP-like transcription factor genes in Moso Bamboo (Phyllostachys edulis).毛竹(Phyllostachys edulis)中SBP类转录因子基因的全基因组鉴定与表达分析
BMC Genomics. 2017 Jun 27;18(1):486. doi: 10.1186/s12864-017-3882-4.
4
Characterization of the floral transcriptome of Moso bamboo (Phyllostachys edulis) at different flowering developmental stages by transcriptome sequencing and RNA-seq analysis.通过转录组测序和RNA-seq分析对不同开花发育阶段毛竹(Phyllostachys edulis)的花转录组进行表征。
PLoS One. 2014 Jun 10;9(6):e98910. doi: 10.1371/journal.pone.0098910. eCollection 2014.
5
Genome-Wide analysis of the AAAP gene family in moso bamboo (Phyllostachys edulis).毛竹(Phyllostachys edulis)中AAAP基因家族的全基因组分析。
BMC Plant Biol. 2017 Jan 31;17(1):29. doi: 10.1186/s12870-017-0980-z.
6
Large Scale Profiling of Protein Isoforms Using Label-Free Quantitative Proteomics Revealed the Regulation of Nonsense-Mediated Decay in Moso Bamboo ().利用无标记定量蛋白质组学对蛋白质异构体进行大规模分析揭示了毛竹()中非翻译介导的衰变调控。
Cells. 2019 Jul 19;8(7):744. doi: 10.3390/cells8070744.
7
Expression Analysis and Regulation Network Identification of the CONSTANS-Like Gene Family in Moso Bamboo () Under Photoperiod Treatments.光周期处理下毛竹 CONSTANS 类基因家族的表达分析及调控网络鉴定。
DNA Cell Biol. 2019 Jul;38(7):607-626. doi: 10.1089/dna.2018.4611. Epub 2019 Jun 17.
8
Integrative lncRNA landscape reveals lncRNA-coding gene networks in the secondary cell wall biosynthesis pathway of moso bamboo (Phyllostachys edulis).整合长链非编码 RNA 图谱揭示了毛竹(Phyllostachys edulis)次生细胞壁生物合成途径中的长链非编码 RNA-编码基因网络。
BMC Genomics. 2021 Sep 4;22(1):638. doi: 10.1186/s12864-021-07953-z.
9
Genome-wide identification and expression characterization of the DoG gene family of moso bamboo (Phyllostachys edulis).毛竹(Phyllostachys edulis)DoG 基因家族的全基因组鉴定和表达特征分析。
BMC Genomics. 2022 May 10;23(1):357. doi: 10.1186/s12864-022-08551-3.
10
Genome-Wide Analysis of the AP2/ERF Transcription Factors Family and the Expression Patterns of DREB Genes in Moso Bamboo (Phyllostachys edulis).毛竹(Phyllostachys edulis)AP2/ERF转录因子家族的全基因组分析及DREB基因的表达模式
PLoS One. 2015 May 18;10(5):e0126657. doi: 10.1371/journal.pone.0126657. eCollection 2015.

引用本文的文献

1
ORFanID: A web-based search engine for the discovery and identification of orphan and taxonomically restricted genes.ORFanID:一个基于网络的搜索引擎,用于发现和鉴定孤儿基因和分类受限基因。
PLoS One. 2023 Oct 25;18(10):e0291260. doi: 10.1371/journal.pone.0291260. eCollection 2023.
2
CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model.环状PCBL:使用CNN-BiGRU-GLT模型鉴定植物环状RNA
Plants (Basel). 2023 Apr 14;12(8):1652. doi: 10.3390/plants12081652.
3
Table Tennis Track Detection Based on Temporal Feature Multiplexing Network.

本文引用的文献

1
Genome-Wide Identification, Characterization and Function Analysis of Lineage-Specific Genes in the Tea Plant .茶树中谱系特异性基因的全基因组鉴定、表征及功能分析
Front Genet. 2021 Nov 10;12:770570. doi: 10.3389/fgene.2021.770570. eCollection 2021.
2
Identification, characterization and expression analysis of lineage-specific genes within mangrove species Aegiceras corniculatum.鉴定、特征描述及红树物种桐花树(Aegiceras corniculatum)谱系特异性基因的表达分析。
Mol Genet Genomics. 2021 Nov;296(6):1235-1247. doi: 10.1007/s00438-021-01810-0. Epub 2021 Aug 6.
3
A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates.
基于时间特征复用网络的乒乓球轨迹检测。
Sensors (Basel). 2023 Feb 3;23(3):1726. doi: 10.3390/s23031726.
4
Taxonomically Restricted Genes Are Associated With Responses to Biotic and Abiotic Stresses in Sugarcane ( spp.).分类学上受限的基因与甘蔗(甘蔗属)对生物和非生物胁迫的反应相关。
Front Plant Sci. 2022 Jun 30;13:923069. doi: 10.3389/fpls.2022.923069. eCollection 2022.
一种预测自闭症风险基因的机器学习方法:已知基因的验证与新候选基因的发现
Front Genet. 2020 Sep 10;11:500064. doi: 10.3389/fgene.2020.500064. eCollection 2020.
4
KEGG: integrating viruses and cellular organisms.KEGG:整合病毒和细胞生物。
Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551. doi: 10.1093/nar/gkaa970.
5
Gene Expression Profile Prediction in Uveal Melanoma Using Deep Learning: A Pilot Study for the Development of an Alternative Survival Prediction Tool.利用深度学习预测葡萄膜黑色素瘤的基因表达谱:开发替代生存预测工具的初步研究
Ophthalmol Retina. 2020 Dec;4(12):1213-1215. doi: 10.1016/j.oret.2020.06.023. Epub 2020 Jun 18.
6
Genome-wide identification, characterization and expression analysis of lineage-specific genes within Hanseniaspora yeasts.酵母属内谱系特异性基因的全基因组鉴定、特征描述和表达分析。
FEMS Microbiol Lett. 2020 Jun 1;367(11). doi: 10.1093/femsle/fnaa077.
7
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
8
Accurate classification of membrane protein types based on sequence and evolutionary information using deep learning.基于序列和进化信息的膜蛋白类型的深度学习精确分类。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):700. doi: 10.1186/s12859-019-3275-6.
9
Genome-Wide Investigation of the NAC Gene Family and Its Potential Association with the Secondary Cell Wall in Moso Bamboo.全基因组范围内 NAC 基因家族的研究及其与毛竹次生细胞壁形成的潜在关联。
Biomolecules. 2019 Oct 14;9(10):609. doi: 10.3390/biom9100609.
10
Deriving external forces via convolutional neural networks for biomedical image segmentation.通过卷积神经网络推导外力用于生物医学图像分割。
Biomed Opt Express. 2019 Jul 8;10(8):3800-3814. doi: 10.1364/BOE.10.003800. eCollection 2019 Aug 1.