• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从段落到图表:用于信息可视化的潜在语义分析

From paragraph to graph: latent semantic analysis for information visualization.

作者信息

Landauer Thomas K, Laham Darrell, Derr Marcia

机构信息

Department of Psychology, University of Colorado, Boulder, CO 80309-0345, USA.

出版信息

Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5214-9. doi: 10.1073/pnas.0400341101. Epub 2004 Mar 22.

DOI:10.1073/pnas.0400341101
PMID:15037748
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC387298/
Abstract

Most techniques for relating textual information rely on intellectually created links such as author-chosen keywords and titles, authority indexing terms, or bibliographic citations. Similarity of the semantic content of whole documents, rather than just titles, abstracts, or overlap of keywords, offers an attractive alternative. Latent semantic analysis provides an effective dimension reduction method for the purpose that reflects synonymy and the sense of arbitrary word combinations. However, latent semantic analysis correlations with human text-to-text similarity judgments are often empirically highest at approximately 300 dimensions. Thus, two- or three-dimensional visualizations are severely limited in what they can show, and the first and/or second automatically discovered principal component, or any three such for that matter, rarely capture all of the relations that might be of interest. It is our conjecture that linguistic meaning is intrinsically and irreducibly very high dimensional. Thus, some method to explore a high dimensional similarity space is needed. But the 2.7 x 10(7) projections and infinite rotations of, for example, a 300-dimensional pattern are impossible to examine. We suggest, however, that the use of a high dimensional dynamic viewer with an effective projection pursuit routine and user control, coupled with the exquisite abilities of the human visual system to extract information about objects and from moving patterns, can often succeed in discovering multiple revealing views that are missed by current computational algorithms. We show some examples of the use of latent semantic analysis to support such visualizations and offer views on future needs.

摘要

大多数关联文本信息的技术依赖于人为创建的链接,如作者选择的关键词和标题、权威索引词或文献引用。与仅考虑标题、摘要或关键词重叠不同,整个文档语义内容的相似性提供了一种有吸引力的替代方法。潜在语义分析提供了一种有效的降维方法,用于反映同义词和任意词组合的语义。然而,潜在语义分析与人类文本间相似性判断的相关性在大约300维时通常在经验上是最高的。因此,二维或三维可视化在所能展示的内容方面受到严重限制,并且自动发现的第一和/或第二主成分,或者就此而言的任何三个主成分,很少能捕捉到所有可能感兴趣的关系。我们推测语言意义本质上是高维且不可简化的。因此,需要某种方法来探索高维相似性空间。但是,例如一个300维模式的2.7×10⁷个投影和无限旋转是无法检验的。然而,我们建议使用具有有效投影追踪程序和用户控制的高维动态查看器,再结合人类视觉系统从移动模式中提取物体信息的卓越能力,通常能够成功发现当前计算算法所遗漏的多个有启发性的视图。我们展示了一些使用潜在语义分析来支持此类可视化的示例,并对未来需求提出了看法。

相似文献

1
From paragraph to graph: latent semantic analysis for information visualization.从段落到图表:用于信息可视化的潜在语义分析
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5214-9. doi: 10.1073/pnas.0400341101. Epub 2004 Mar 22.
2
Text Influenced Molecular Indexing (TIMI): a literature database mining approach that handles text and chemistry.文本影响分子索引(TIMI):一种处理文本和化学信息的文献数据库挖掘方法。
J Chem Inf Comput Sci. 2003 May-Jun;43(3):743-52. doi: 10.1021/ci025587a.
3
Modeling semantic aspects for cross-media image indexing.跨媒体图像索引的语义方面建模
IEEE Trans Pattern Anal Mach Intell. 2007 Oct;29(10):1802-17. doi: 10.1109/TPAMI.2007.1097.
4
Multidimensional latent semantic analysis using term spatial information.多维潜在语义分析利用术语空间信息。
IEEE Trans Cybern. 2013 Dec;43(6):1625-40. doi: 10.1109/TSMCC.2012.2227112.
5
Font adaptive word indexing of modern printed documents.现代印刷文档的字体自适应词索引
IEEE Trans Pattern Anal Mach Intell. 2006 Aug;28(8):1187-99. doi: 10.1109/TPAMI.2006.162.
6
More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis.更多数据胜过更智能的算法:逐点互信息与潜在语义分析的比较
Behav Res Methods. 2009 Aug;41(3):647-56. doi: 10.3758/BRM.41.3.647.
7
Graph visualization techniques for web clustering engines.用于网络聚类引擎的图形可视化技术。
IEEE Trans Vis Comput Graph. 2007 Mar-Apr;13(2):294-304. doi: 10.1109/TVCG.2007.40.
8
Graph signatures for visual analytics.用于可视化分析的图形签名。
IEEE Trans Vis Comput Graph. 2006 Nov-Dec;12(6):1399-413. doi: 10.1109/TVCG.2006.92.
9
Large-scale latent semantic analysis.大规模潜在语义分析。
Behav Res Methods. 2011 Jun;43(2):414-23. doi: 10.3758/s13428-010-0050-z.
10
New algorithms assessing short summaries in expository texts using latent semantic analysis.使用潜在语义分析评估说明性文本简短摘要的新算法。
Behav Res Methods. 2009 Aug;41(3):944-50. doi: 10.3758/BRM.41.3.944.

引用本文的文献

1
A text mining and network analysis of topics and trends in major nursing research journals.文本挖掘和网络分析主要护理研究期刊中的主题和趋势。
Nurs Open. 2024 Jan;11(1):e2050. doi: 10.1002/nop2.2050.
2
Gene-regulatory network analysis of ankylosing spondylitis with a single-cell chromatin accessible assay.基于单细胞染色质可及性分析的强直性脊柱炎基因调控网络分析。
Sci Rep. 2020 Nov 10;10(1):19411. doi: 10.1038/s41598-020-76574-5.
3
β-Arrestin Based Receptor Signaling Paradigms: Potential Therapeutic Targets for Complex Age-Related Disorders.基于β-抑制蛋白的受体信号转导模式:复杂年龄相关疾病的潜在治疗靶点
Front Pharmacol. 2018 Nov 28;9:1369. doi: 10.3389/fphar.2018.01369. eCollection 2018.
4
Altered selection during language processing in individuals at high risk for psychosis.高风险精神分裂症个体在语言处理过程中选择的改变。
Schizophr Res. 2018 Dec;202:303-309. doi: 10.1016/j.schres.2018.06.036. Epub 2018 Jun 20.
5
Using text analysis to quantify the similarity and evolution of scientific disciplines.运用文本分析量化科学学科的相似性与演变。
R Soc Open Sci. 2018 Jan 17;5(1):171545. doi: 10.1098/rsos.171545. eCollection 2018 Jan.
6
Probing the topological properties of complex networks modeling short written texts.探究用于模拟简短书面文本的复杂网络的拓扑特性。
PLoS One. 2015 Feb 26;10(2):e0118394. doi: 10.1371/journal.pone.0118394. eCollection 2015.
7
Empirical study using network of semantically related associations in bridging the knowledge gap.利用语义相关关联网络弥合知识差距的实证研究。
J Transl Med. 2014 Nov 27;12:324. doi: 10.1186/s12967-014-0324-9.
8
Plurigon: three dimensional visualization and classification of high-dimensionality data.Plurigon:高维数据的三维可视化和分类。
Front Physiol. 2013 Jul 22;4:190. doi: 10.3389/fphys.2013.00190. eCollection 2013.
9
Literature aided determination of data quality and statistical significance threshold for gene expression studies.文献辅助确定基因表达研究的数据质量和统计显著性阈值。
BMC Genomics. 2012;13 Suppl 8(Suppl 8):S23. doi: 10.1186/1471-2164-13-S8-S23. Epub 2012 Dec 17.
10
Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.基于 PubMed 摘要潜在语义索引的基因集功能内聚性。
PLoS One. 2011 Apr 14;6(4):e18851. doi: 10.1371/journal.pone.0018851.

本文引用的文献

1
An unsupervised method for the extraction of propositional information from text.一种从文本中提取命题信息的无监督方法。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5206-13. doi: 10.1073/pnas.0307758101. Epub 2004 Mar 15.
2
Mixed-membership models of scientific publications.科学出版物的混合成员模型。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5220-7. doi: 10.1073/pnas.0307760101. Epub 2004 Mar 12.
3
Finding scientific topics.寻找科学主题。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.