• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 NMF 和 ReliefF 的基因家族高效特征选择算法。

An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF.

机构信息

College of Plant Protection, Hunan Agricultural University, Changsha 410128, China.

Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China.

出版信息

Genes (Basel). 2023 Feb 6;14(2):421. doi: 10.3390/genes14020421.

DOI:10.3390/genes14020421
PMID:36833348
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9957060/
Abstract

Gene families, which are parts of a genome's information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method's categorization is superior to state-of-the-art feature selection approaches.

摘要

基因家族是基因组信息存储层次结构的一部分,在多细胞生物的发育和多样性中发挥着重要作用。已有多项研究聚焦于基因家族的特征,如功能、同源性或表型等。然而,对基因家族成员在基因组中的分布进行统计和相关性分析的工作尚未开展。本研究报告了一种新的框架,该框架结合了基于 NMF-ReliefF 的基因家族分析和基因组选择。具体而言,该方法首先从 TreeFam 数据库中获取基因家族,并确定特征矩阵中的基因家族数量。然后,使用 NMF-ReliefF 从基因特征矩阵中选择特征,这是一种新的特征选择算法,克服了传统方法的效率低下问题。最后,使用支持向量机对获取的特征进行分类。结果表明,该框架在昆虫基因组测试集上的准确率为 89.1%,AUC 为 0.919。我们还使用了四个微阵列基因数据集来评估 NMF-ReliefF 算法的性能。结果表明,该方法可能在稳健性和区分度之间取得了微妙的平衡。此外,与最先进的特征选择方法相比,所提出的方法在分类方面具有优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/22eb1b00a784/genes-14-00421-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/fbee6b5b31ac/genes-14-00421-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/b4fd454c26e4/genes-14-00421-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/75f111d53937/genes-14-00421-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/028cc1bf2cc4/genes-14-00421-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/22eb1b00a784/genes-14-00421-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/fbee6b5b31ac/genes-14-00421-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/b4fd454c26e4/genes-14-00421-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/75f111d53937/genes-14-00421-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/028cc1bf2cc4/genes-14-00421-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/9957060/22eb1b00a784/genes-14-00421-g005.jpg

相似文献

1
An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF.基于 NMF 和 ReliefF 的基因家族高效特征选择算法。
Genes (Basel). 2023 Feb 6;14(2):421. doi: 10.3390/genes14020421.
2
A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification.基于 ReliefF 和蚁群优化算法的混合基因选择方法在肿瘤分类中的应用。
Sci Rep. 2019 Jun 20;9(1):8978. doi: 10.1038/s41598-019-45223-x.
3
Stable gene selection from microarray data via sample weighting.基于样本加权的基因芯片数据中稳定基因的选择。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):262-72. doi: 10.1109/TCBB.2011.47. Epub 2011 Mar 3.
4
Differentiation of fat-poor angiomyolipoma from clear cell renal cell carcinoma in contrast-enhanced MDCT images using quantitative feature classification.基于定量特征分类的 MDCT 增强图像鉴别乏脂性血管平滑肌脂肪瘤与透明细胞肾细胞癌
Med Phys. 2017 Jul;44(7):3604-3614. doi: 10.1002/mp.12258. Epub 2017 Jun 9.
5
Gene selection algorithm by combining reliefF and mRMR.结合reliefF和mRMR的基因选择算法。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S27. doi: 10.1186/1471-2164-9-S2-S27.
6
Improving PLS-RFE based gene selection for microarray data classification.改进基于偏最小二乘回归特征消除法的基因选择用于微阵列数据分类
Comput Biol Med. 2015 Jul;62:14-24. doi: 10.1016/j.compbiomed.2015.04.011. Epub 2015 Apr 17.
7
A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF.基于二进制状态转换算法和 ReliefF 的混合特征选择方法。
IEEE J Biomed Health Inform. 2019 Sep;23(5):1888-1898. doi: 10.1109/JBHI.2018.2872811. Epub 2018 Sep 28.
8
A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization.基于集成筛选器和二进制差分进化并结合二进制非洲秃鹫优化的两阶段混合生物标志物选择方法。
BMC Bioinformatics. 2023 Apr 4;24(1):130. doi: 10.1186/s12859-023-05247-7.
9
Feature selection using regularized neighbourhood component analysis to enhance the classification performance of motor imagery signals.使用正则化邻域成分分析进行特征选择,以提高运动想象信号的分类性能。
Comput Biol Med. 2019 Apr;107:118-126. doi: 10.1016/j.compbiomed.2019.02.009. Epub 2019 Feb 19.
10
A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.一种基于K-S检验和CFS的癌症基因选择算法
Biomed Res Int. 2017;2017:1645619. doi: 10.1155/2017/1645619. Epub 2017 May 8.

引用本文的文献

1
A retrospective prognostic evaluation using unsupervised learning in the treatment of COVID-19 patients with hypertension treated with ACEI/ARB drugs.使用无监督学习对 COVID-19 合并高血压患者应用 ACEI/ARB 类药物治疗的回顾性预后评估。
PeerJ. 2024 May 13;12:e17340. doi: 10.7717/peerj.17340. eCollection 2024.
2
OralEpitheliumDB: A Dataset for Oral Epithelial Dysplasia Image Segmentation and Classification.口腔上皮数据库:用于口腔上皮异型增生图像分割和分类的数据集。
J Imaging Inform Med. 2024 Aug;37(4):1691-1710. doi: 10.1007/s10278-024-01041-w. Epub 2024 Feb 26.

本文引用的文献

1
InsectBase 2.0: a comprehensive gene resource for insects.昆虫数据库 2.0:昆虫综合基因资源
Nucleic Acids Res. 2022 Jan 7;50(D1):D1040-D1045. doi: 10.1093/nar/gkab1090.
2
Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.Ensembl Genomes 2022:一个不断扩展的非脊椎动物基因组资源。
Nucleic Acids Res. 2022 Jan 7;50(D1):D996-D1003. doi: 10.1093/nar/gkab1007.
3
VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center.VEuPathDB:真核病原体、载体和宿主生物信息学资源中心。
Nucleic Acids Res. 2022 Jan 7;50(D1):D898-D911. doi: 10.1093/nar/gkab929.
4
Lineage-Specific Genes and Family Expansions in Dictyostelid Genomes Display Expression Bias and Evolutionary Diversification during Development.谱系特异性基因和盘基网柄菌属基因组中的家族扩张在发育过程中表现出表达偏向和进化多样化。
Genes (Basel). 2021 Oct 16;12(10):1628. doi: 10.3390/genes12101628.
5
Chromosome-level genome assembly of an agricultural pest, the rice leaffolder Cnaphalocrocis exigua (Crambidae, Lepidoptera).一种农业害虫——稻纵卷叶螟(螟蛾科,鳞翅目)的染色体水平基因组组装
Mol Ecol Resour. 2022 Jan;22(1):307-318. doi: 10.1111/1755-0998.13461. Epub 2021 Jul 18.
6
Gut bacterial communities across 12 Ensifera (Orthoptera) at different feeding habits and its prediction for the insect with contrasting feeding habits.不同取食习性的 12 种直翅目(Orthoptera)昆虫肠道细菌群落及其对具有不同取食习性昆虫的预测。
PLoS One. 2021 Apr 26;16(4):e0250675. doi: 10.1371/journal.pone.0250675. eCollection 2021.
7
Less Is More, Natural Loss-of-Function Mutation Is a Strategy for Adaptation.少即是多,自然功能丧失突变是一种适应策略。
Plant Commun. 2020 Aug 13;1(6):100103. doi: 10.1016/j.xplc.2020.100103. eCollection 2020 Nov 9.
8
OrthoDB in 2020: evolutionary and functional annotations of orthologs.2020 年的 OrthoDB:直系同源物的进化和功能注释。
Nucleic Acids Res. 2021 Jan 8;49(D1):D389-D393. doi: 10.1093/nar/gkaa1009.
9
OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more.2021 年的 OMA 同源物:网站大改版,保守同工型,祖先进化基因顺序等等。
Nucleic Acids Res. 2021 Jan 8;49(D1):D373-D379. doi: 10.1093/nar/gkaa1007.
10
A review of feature selection methods in medical applications.医学应用中的特征选择方法综述。
Comput Biol Med. 2019 Sep;112:103375. doi: 10.1016/j.compbiomed.2019.103375. Epub 2019 Jul 31.