• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用单位不变拐点法确定基因表达数据集非负矩阵分解模型的最优秩:秩选择的肘部方法的开发与评估

Decision of the Optimal Rank of a Nonnegative Matrix Factorization Model for Gene Expression Data Sets Utilizing the Unit Invariant Knee Method: Development and Evaluation of the Elbow Method for Rank Selection.

作者信息

Guven Emine

机构信息

Department of Biomedical Engineering, Düzce University, Düzce, Turkey.

出版信息

JMIR Bioinform Biotechnol. 2023 Jun 6;4:e43665. doi: 10.2196/43665.

DOI:10.2196/43665
PMID:38935969
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11135234/
Abstract

BACKGROUND

There is a great need to develop a computational approach to analyze and exploit the information contained in gene expression data. The recent utilization of nonnegative matrix factorization (NMF) in computational biology has demonstrated the capability to derive essential details from a high amount of data in particular gene expression microarrays. A common problem in NMF is finding the proper number rank (r) of factors of the degraded demonstration, but no agreement exists on which technique is most appropriate to utilize for this purpose. Thus, various techniques have been suggested to select the optimal value of rank factorization (r).

OBJECTIVE

In this work, a new metric for rank selection is proposed based on the elbow method, which was methodically compared against the cophenetic metric.

METHODS

To decide the optimum number rank (r), this study focused on the unit invariant knee (UIK) method of the NMF on gene expression data sets. Since the UIK method requires an extremum distance estimator that is eventually employed for inflection and identification of a knee point, the proposed method finds the first inflection point of the curvature of the residual sum of squares of the proposed algorithms using the UIK method on gene expression data sets as a target matrix.

RESULTS

Computation was conducted for the UIK task using gene expression data of acute lymphoblastic leukemia and acute myeloid leukemia samples. Consequently, the distinct results of NMF were subjected to comparison on different algorithms. The proposed UIK method is easy to perform, fast, free of a priori rank value input, and does not require initial parameters that significantly influence the model's functionality.

CONCLUSIONS

This study demonstrates that the elbow method provides a credible prediction for both gene expression data and for precisely estimating simulated mutational processes data with known dimensions. The proposed UIK method is faster than conventional methods, including metrics utilizing the consensus matrix as a criterion for rank selection, while achieving significantly better computational efficiency without visual inspection on the curvatives. Finally, the suggested rank tuning method based on the elbow method for gene expression data is arguably theoretically superior to the cophenetic measure.

摘要

背景

迫切需要开发一种计算方法来分析和利用基因表达数据中包含的信息。非负矩阵分解(NMF)最近在计算生物学中的应用已证明有能力从大量数据(特别是基因表达微阵列)中获取重要细节。NMF中的一个常见问题是找到退化演示的因子的合适数量秩(r),但对于为此目的最适合使用哪种技术尚无共识。因此,已提出各种技术来选择秩分解(r)的最佳值。

目的

在这项工作中,基于肘部方法提出了一种新的秩选择度量,并与共亲系数度量进行了系统比较。

方法

为了确定最佳数量秩(r),本研究重点关注NMF在基因表达数据集上的单位不变拐点(UIK)方法。由于UIK方法需要一个极值距离估计器,该估计器最终用于拐点的确定和拐点的识别,因此所提出的方法以基因表达数据集作为目标矩阵,使用UIK方法找到所提出算法的残差平方和曲率的第一个拐点。

结果

使用急性淋巴细胞白血病和急性髓细胞白血病样本的基因表达数据对UIK任务进行了计算。因此,对不同算法的NMF不同结果进行了比较。所提出的UIK方法易于执行、速度快、无需先验秩值输入,并且不需要对模型功能有重大影响的初始参数。

结论

本研究表明,肘部方法为基因表达数据以及精确估计已知维度的模拟突变过程数据提供了可靠的预测。所提出的UIK方法比传统方法更快,包括使用一致性矩阵作为秩选择标准的度量,同时在无需对曲率进行目视检查的情况下实现了显著更好的计算效率。最后,所建议的基于肘部方法的基因表达数据秩调整方法在理论上可以说是优于共亲系数度量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/97e7e4110ee8/bioinform_v4i1e43665_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/d2fa5453ef73/bioinform_v4i1e43665_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/be2b6883fade/bioinform_v4i1e43665_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/78a895687daa/bioinform_v4i1e43665_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/97e7e4110ee8/bioinform_v4i1e43665_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/d2fa5453ef73/bioinform_v4i1e43665_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/be2b6883fade/bioinform_v4i1e43665_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/78a895687daa/bioinform_v4i1e43665_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e1/11135234/97e7e4110ee8/bioinform_v4i1e43665_fig4.jpg

相似文献

1
Decision of the Optimal Rank of a Nonnegative Matrix Factorization Model for Gene Expression Data Sets Utilizing the Unit Invariant Knee Method: Development and Evaluation of the Elbow Method for Rank Selection.利用单位不变拐点法确定基因表达数据集非负矩阵分解模型的最优秩:秩选择的肘部方法的开发与评估
JMIR Bioinform Biotechnol. 2023 Jun 6;4:e43665. doi: 10.2196/43665.
2
Generalized Separable Nonnegative Matrix Factorization.广义可分离非负矩阵分解
IEEE Trans Pattern Anal Mach Intell. 2021 May;43(5):1546-1561. doi: 10.1109/TPAMI.2019.2956046. Epub 2021 Apr 1.
3
A flexible R package for nonnegative matrix factorization.一个用于非负矩阵分解的灵活 R 包。
BMC Bioinformatics. 2010 Jul 2;11:367. doi: 10.1186/1471-2105-11-367.
4
Structurally Incoherent Low-Rank Nonnegative Matrix Factorization for Image Classification.基于结构不连贯的低秩非负矩阵分解的图像分类。
IEEE Trans Image Process. 2018 Nov;27(11):5248-5260. doi: 10.1109/TIP.2018.2855433. Epub 2018 Jul 12.
5
Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data.基于Hessian正则化的对称非负矩阵分解用于聚类基因表达和微生物组数据
Methods. 2016 Dec 1;111:80-84. doi: 10.1016/j.ymeth.2016.06.017. Epub 2016 Jun 20.
6
Efficient Nonnegative Matrix Factorization by DC Programming and DCA.基于DC规划和DCA的高效非负矩阵分解
Neural Comput. 2016 Jun;28(6):1163-216. doi: 10.1162/NECO_a_00836. Epub 2016 May 3.
7
Convex nonnegative matrix factorization with manifold regularization.具有流形正则化的凸非负矩阵分解。
Neural Netw. 2015 Mar;63:94-103. doi: 10.1016/j.neunet.2014.11.007. Epub 2014 Dec 4.
8
Nonnegative Matrix Factorization with Earth Mover's Distance Metric for Image Analysis.基于 Earth Mover's Distance Metric 的非负矩阵分解在图像分析中的应用。
IEEE Trans Pattern Anal Mach Intell. 2011 Aug;33(8):1590-602. doi: 10.1109/TPAMI.2011.18. Epub 2011 Jan 28.
9
Matrix factorization algorithms for the identification of muscle synergies: evaluation on simulated and experimental data sets.用于识别肌肉协同作用的矩阵分解算法:对模拟数据集和实验数据集的评估
J Neurophysiol. 2006 Apr;95(4):2199-212. doi: 10.1152/jn.00222.2005. Epub 2006 Jan 4.
10
Rank selection for non-negative matrix factorization.非负矩阵分解的秩选择。
Stat Med. 2023 Dec 30;42(30):5676-5693. doi: 10.1002/sim.9934. Epub 2023 Oct 17.

引用本文的文献

1
Genome-Scale Meta-analysis of Host Responses to Staphylococcus aureus Identifies Pathways for Host-Directed Therapeutic Targeting.金黄色葡萄球菌宿主反应的全基因组规模荟萃分析确定了宿主导向性治疗靶点的途径。
J Infect Dis. 2025 Aug 14;232(2):e290-e300. doi: 10.1093/infdis/jiaf290.
2
Gut and oral microbial compositional differences in women with breast cancer, women with ductal carcinoma , and healthy women.患有乳腺癌的女性、患有导管癌的女性和健康女性的肠道和口腔微生物组成存在差异。
mSystems. 2024 Nov 19;9(11):e0123724. doi: 10.1128/msystems.01237-24. Epub 2024 Oct 29.

本文引用的文献

1
Revealing genetic links of Type 2 diabetes that lead to the development of Alzheimer's disease.揭示导致阿尔茨海默病发生的2型糖尿病的遗传联系。
Heliyon. 2022 Dec 16;9(1):e12202. doi: 10.1016/j.heliyon.2022.e12202. eCollection 2023 Jan.
2
Uncovering novel mutational signatures by extraction with SigProfilerExtractor.通过SigProfilerExtractor提取来揭示新的突变特征。
Cell Genom. 2022 Nov 9;2(11):None. doi: 10.1016/j.xgen.2022.100179.
3
Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization.
评估非负矩阵分解中组件数量的评估方法
Mathematics (Basel). 2021 Nov 2;9(22). doi: 10.3390/math9222840. Epub 2021 Nov 9.
4
Large field-of-view non-invasive imaging through scattering layers using fluctuating random illumination.利用脉动随机照明穿透散射层进行大视场非侵入式成像。
Nat Commun. 2022 Mar 18;13(1):1447. doi: 10.1038/s41467-022-29166-y.
5
Statistical Methods for Item Reduction in a Representative Lifestyle Questionnaire: Pilot Questionnaire Study.代表性生活方式问卷中项目简化的统计方法:预调查问卷研究
Interact J Med Res. 2022 Mar 18;11(1):e28692. doi: 10.2196/28692.
6
Non-negative matrix factorization and differential expression analyses identify hub genes linked to progression and prognosis of glioblastoma multiforme.非负矩阵分解和差异表达分析鉴定与胶质母细胞瘤进展和预后相关的枢纽基因。
Gene. 2022 May 25;824:146395. doi: 10.1016/j.gene.2022.146395. Epub 2022 Mar 11.
7
Tablet vs. station-based laptop ultrasound devices increases internal medicine resident point-of-care ultrasound performance: a prospective cohort study.平板式与台式笔记本超声设备对提高内科住院医师床旁超声检查表现的比较:一项前瞻性队列研究。
Ultrasound J. 2020 Apr 16;12(1):18. doi: 10.1186/s13089-020-00165-8.
8
Optimization and expansion of non-negative matrix factorization.非负矩阵分解的优化与扩展。
BMC Bioinformatics. 2020 Jan 6;21(1):7. doi: 10.1186/s12859-019-3312-5.
9
Blind Source Separation on Non-Contact Heartbeat Detection by Non-Negative Matrix Factorization Algorithms.基于非负矩阵分解算法的非接触式心率检测中的盲源分离。
IEEE Trans Biomed Eng. 2020 Feb;67(2):482-494. doi: 10.1109/TBME.2019.2915762. Epub 2019 May 9.
10
The Scree Test For The Number Of Factors.因子数量的碎石检验
Multivariate Behav Res. 1966 Apr 1;1(2):245-76. doi: 10.1207/s15327906mbr0102_10.