• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

海量文档集的自组织

Self organization of a massive document collection.

作者信息

Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A

机构信息

Neural Networks Research Centre, Helsinki University of Technology, Espoo, Finland.

出版信息

IEEE Trans Neural Netw. 2000;11(3):574-85. doi: 10.1109/72.846729.

DOI:10.1109/72.846729
PMID:18249786
Abstract

This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.

摘要

本文描述了一个能够根据文本相似度来组织大量文档集合的系统的实现。它基于自组织映射(SOM)算法。文档的特征向量采用其词汇的统计表示。我们工作的主要目标是扩展SOM算法,使其能够处理大量的高维数据。在一个实际实验中,我们将6,840,568篇专利摘要映射到一个拥有1,002,240个节点的SOM上。作为特征向量,我们使用了通过加权词直方图的随机投影获得的500维随机数向量。

相似文献

1
Self organization of a massive document collection.海量文档集的自组织
IEEE Trans Neural Netw. 2000;11(3):574-85. doi: 10.1109/72.846729.
2
Marginal median SOM for document organization and retrieval.用于文档组织和检索的边际中位数自组织映射
Neural Netw. 2004 Apr;17(3):365-77. doi: 10.1016/j.neunet.2003.08.008.
3
Ranked centroid projection: a data visualization approach with self-organizing maps.排序质心投影:一种使用自组织映射的数据可视化方法。
IEEE Trans Neural Netw. 2008 Feb;19(2):245-59. doi: 10.1109/TNN.2007.905858.
4
Visualizing the topical structure of the medical sciences: a self-organizing map approach.可视化医学科学的主题结构:自组织映射方法。
PLoS One. 2013;8(3):e58779. doi: 10.1371/journal.pone.0058779. Epub 2013 Mar 12.
5
Class distributions on SOM surfaces for feature extraction and object retrieval.
Neural Netw. 2004 Oct-Nov;17(8-9):1121-33. doi: 10.1016/j.neunet.2004.07.007.
6
Contextual self-organizing map: software for constructing semantic representations.语境自组织映射:构建语义表示的软件。
Behav Res Methods. 2011 Mar;43(1):77-88. doi: 10.3758/s13428-010-0042-z.
7
Font adaptive word indexing of modern printed documents.现代印刷文档的字体自适应词索引
IEEE Trans Pattern Anal Mach Intell. 2006 Aug;28(8):1187-99. doi: 10.1109/TPAMI.2006.162.
8
Improving cluster visualization in self-organizing maps: application in gene expression data analysis.改进自组织映射中的聚类可视化:在基因表达数据分析中的应用。
Comput Biol Med. 2007 Dec;37(12):1677-89. doi: 10.1016/j.compbiomed.2007.04.003. Epub 2007 Jun 4.
9
Modified self-organizing feature map algorithms for efficient digital hardware implementation.用于高效数字硬件实现的改进型自组织特征映射算法
IEEE Trans Neural Netw. 1997;8(2):315-30. doi: 10.1109/72.557669.
10
Self-organizing neural projections.自组织神经投射
Neural Netw. 2006 Jul-Aug;19(6-7):723-33. doi: 10.1016/j.neunet.2006.05.001. Epub 2006 Jun 12.

引用本文的文献

1
Understanding generational differences in digital skills and recreational behaviour for effective visitor management in forest destinations.了解数字技能和娱乐行为的代际差异以实现森林旅游目的地的有效游客管理。
Sci Rep. 2025 May 23;15(1):17887. doi: 10.1038/s41598-025-02036-5.
2
Incipient functional SARS-CoV-2 diversification identified through neural network haplotype maps.通过神经网络单倍型图谱识别出 SARS-CoV-2 功能分化初期。
Proc Natl Acad Sci U S A. 2024 Mar 5;121(10):e2317851121. doi: 10.1073/pnas.2317851121. Epub 2024 Feb 28.
3
Unsupervised Spiking Neural Network with Dynamic Learning of Inhibitory Neurons.
无监督尖峰神经网络的抑制神经元动态学习。
Sensors (Basel). 2023 Aug 17;23(16):7232. doi: 10.3390/s23167232.
4
A Study on User-Oriented Subjects of Child Abuse on Wikipedia: Temporal Analysis of Wikipedia History Versions and Traffic Data.基于用户导向的维基百科儿童虐待主题研究:维基百科历史版本和流量数据的时间分析。
J Med Internet Res. 2023 Jul 17;25:e43901. doi: 10.2196/43901.
5
Application of self-organizing maps to AFM-based viscoelastic characterization of breast cancer cell mechanics.自组织映射在基于原子力显微镜的乳腺癌细胞力学粘弹性特性表征中的应用。
Sci Rep. 2023 Feb 22;13(1):3087. doi: 10.1038/s41598-023-30156-3.
6
Machine Learning Analysis of Essential Oils from Cuban Plants: Potential Activity against Protozoa Parasites.机器学习分析古巴植物精油:抗原生动物寄生虫的潜在活性。
Molecules. 2022 Feb 17;27(4):1366. doi: 10.3390/molecules27041366.
7
A Two-Level, Intramutant Spectrum Haplotype Profile of Hepatitis C Virus Revealed by Self-Organized Maps.自组织映射揭示丙型肝炎病毒的两级、同突变体谱单倍型图谱。
Microbiol Spectr. 2021 Dec 22;9(3):e0145921. doi: 10.1128/Spectrum.01459-21. Epub 2021 Nov 10.
8
SOM-LWL method for identification of COVID-19 on chest X-rays.基于 SOM-LWL 算法的胸部 X 光片 COVID-19 识别方法。
PLoS One. 2021 Feb 24;16(2):e0247176. doi: 10.1371/journal.pone.0247176. eCollection 2021.
9
Machine Learning-Assisted High-Throughput Molecular Dynamics Simulation of High-Mechanical Performance Carbon Nanotube Structure.机器学习辅助的高机械性能碳纳米管结构高通量分子动力学模拟
Nanomaterials (Basel). 2020 Dec 9;10(12):2459. doi: 10.3390/nano10122459.
10
Using machine learning to understand the implications of meteorological conditions for fish kills.利用机器学习理解气象条件对鱼类死亡的影响。
Sci Rep. 2020 Oct 12;10(1):17003. doi: 10.1038/s41598-020-73922-3.