• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用有机化合物的光谱数据检验无监督集成学习。

Examining unsupervised ensemble learning using spectroscopy data of organic compounds.

作者信息

He Kedan, Massena Djenerly G

机构信息

Department of Physical Sciences, School of Arts and Sciences, Eastern Connecticut State University, Willimantic, CT, 06226, USA.

出版信息

J Comput Aided Mol Des. 2023 Jan;37(1):17-37. doi: 10.1007/s10822-022-00488-9. Epub 2022 Nov 21.

DOI:10.1007/s10822-022-00488-9
PMID:36404382
Abstract

One solution to the challenge of choosing an appropriate clustering algorithm is to combine different clusterings into a single consensus clustering result, known as cluster ensemble (CE). This ensemble learning strategy can provide more robust and stable solutions across different domains and datasets. Unfortunately, not all clusterings in the ensemble contribute to the final data partition. Cluster ensemble selection (CES) aims at selecting a subset from a large library of clustering solutions to form a smaller cluster ensemble that performs as well as or better than the set of all available clustering solutions. In this paper, we investigate four CES methods for the categorization of structurally distinct organic compounds using high-dimensional IR and Raman spectroscopy data. Single quality selection (SQI) forms a subset of the ensemble by selecting the highest quality ensemble members. The Single Quality Selection (SQI) method is used with various quality indices to select subsets by including the highest quality ensemble members. The Bagging method, usually applied in supervised learning, ranks ensemble members by calculating the normalized mutual information (NMI) between ensemble members and consensus solutions generated from a randomly sampled subset of the full ensemble. The hierarchical cluster and select method (HCAS-SQI) uses the diversity matrix of ensemble members to select a diverse set of ensemble members with the highest quality. Furthermore, a combining strategy can be used to combine subsets selected using multiple quality indices (HCAS-MQI) for the refinement of clustering solutions in the ensemble. The IR + Raman hybrid ensemble library is created by merging two complementary "views" of the organic compounds. This inherently more diverse library gives the best full ensemble consensus results. Overall, the Bagging method is recommended because it provides the most robust results that are better than or comparable to the full ensemble consensus solutions.

摘要

应对选择合适聚类算法这一挑战的一种方法是将不同的聚类结果合并为一个单一的共识聚类结果,即聚类集成(CE)。这种集成学习策略可以在不同领域和数据集上提供更稳健、更稳定的解决方案。不幸的是,集成中的并非所有聚类都对最终的数据划分有贡献。聚类集成选择(CES)旨在从大量聚类解决方案库中选择一个子集,以形成一个较小的聚类集成,其性能与所有可用聚类解决方案集相同或更好。在本文中,我们研究了四种CES方法,用于使用高维红外和拉曼光谱数据对结构不同的有机化合物进行分类。单质量选择(SQI)通过选择质量最高的集成成员来形成集成的一个子集。单质量选择(SQI)方法与各种质量指标一起使用,通过纳入质量最高的集成成员来选择子集。Bagging方法通常应用于监督学习,通过计算集成成员与从整个集成的随机采样子集中生成的共识解决方案之间的归一化互信息(NMI)来对集成成员进行排名。层次聚类和选择方法(HCAS-SQI)使用集成成员的多样性矩阵来选择一组质量最高的不同集成成员。此外,可以使用一种组合策略来组合使用多个质量指标(HCAS-MQI)选择的子集,以优化集成中的聚类解决方案。红外+拉曼混合集成库是通过合并有机化合物的两个互补“视图”创建的。这个本质上更多样化的库给出了最佳的全集成共识结果。总体而言,推荐Bagging方法,因为它提供了最稳健的结果,优于或与全集成共识解决方案相当。

相似文献

1
Examining unsupervised ensemble learning using spectroscopy data of organic compounds.使用有机化合物的光谱数据检验无监督集成学习。
J Comput Aided Mol Des. 2023 Jan;37(1):17-37. doi: 10.1007/s10822-022-00488-9. Epub 2022 Nov 21.
2
Cluster ensemble based on Random Forests for genetic data.基于随机森林的基因数据聚类集成方法
BioData Min. 2017 Dec 15;10:37. doi: 10.1186/s13040-017-0156-2. eCollection 2017.
3
An analysis framework for clustering algorithm selection with applications to spectroscopy.一种聚类算法选择的分析框架及其在光谱学中的应用。
PLoS One. 2022 Mar 31;17(3):e0266369. doi: 10.1371/journal.pone.0266369. eCollection 2022.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach.基于多视图多表示共识聚类集成(mmcc)方法的增强型网络搜索结果聚类模型。
PLoS One. 2021 Jan 15;16(1):e0245264. doi: 10.1371/journal.pone.0245264. eCollection 2021.
6
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.用于从生物分子数据中进行肿瘤聚类的混合模糊聚类集成框架。
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.
7
LCE: a link-based cluster ensemble method for improved gene expression data analysis.LCE:一种基于链接的聚类集成方法,用于改进基因表达数据分析。
Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.
8
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析
Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.
9
Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.基于双重选择的半监督聚类集成用于从基因表达谱中进行肿瘤聚类
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.
10
Knowledge based cluster ensemble for cancer discovery from biomolecular data.基于知识的聚类集成在生物分子数据中的癌症发现。
IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85. doi: 10.1109/TNB.2011.2144997. Epub 2011 Jul 7.