• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

发现分子分类的有趣分子亚结构。

Discovering interesting molecular substructures for molecular classification.

机构信息

Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong.

出版信息

IEEE Trans Nanobioscience. 2010 Jun;9(2):77-89. doi: 10.1109/TNB.2010.2042609.

DOI:10.1109/TNB.2010.2042609
PMID:20650702
Abstract

Given a set of molecular structure data preclassified into a number of classes, the molecular classification problem is concerned with the discovering of interesting structural patterns in the data so that "unseen" molecules not originally in the dataset can be accurately classified. To tackle the problem, interesting molecular substructures have to be discovered and this is done typically by first representing molecular structures in molecular graphs, and then, using graph-mining algorithms to discover frequently occurring subgraphs in them. These subgraphs are then used to characterize different classes for molecular classification. While such an approach can be very effective, it should be noted that a substructure that occurs frequently in one class may also does occur in another. The discovering of frequent subgraphs for molecular classification may, therefore, not always be the most effective. In this paper, we propose a novel technique called mining interesting substructures in molecular data for classification (MISMOC) that can discover interesting frequent subgraphs not just for the characterization of a molecular class but also for the distinguishing of it from the others. Using a test statistic, MISMOC screens each frequent subgraph to determine if they are interesting. For those that are interesting, their degrees of interestingness are determined using an information-theoretic measure. When classifying an unseen molecule, its structure is then matched against the interesting subgraphs in each class and a total interestingness measure for the unseen molecule to be classified into a particular class is determined, which is based on the interestingness of each matched subgraphs. The performance of MISMOC is evaluated using both artificial and real datasets, and the results show that it can be an effective approach for molecular classification.

摘要

给定一组预先分类为若干类别的分子结构数据,分子分类问题涉及发现数据中的有趣结构模式,以便能够准确地对“未见过”的原始数据集之外的分子进行分类。为了解决这个问题,必须发现有趣的分子子结构,这通常是通过首先将分子结构表示为分子图,然后使用图挖掘算法在其中发现频繁出现的子图来完成的。然后,这些子图用于对不同的分子类别进行特征描述。虽然这种方法可能非常有效,但应该注意的是,在一个类别中频繁出现的子结构也可能在另一个类别中出现。因此,频繁子图的发现对于分子分类可能并不总是最有效的。在本文中,我们提出了一种名为“用于分类的分子数据中有趣子结构挖掘”(MISMOC)的新技术,它不仅可以发现用于描述分子类别的有趣频繁子图,还可以发现用于区分不同分子类别的有趣频繁子图。MISMOC 使用测试统计量筛选每个频繁子图,以确定它们是否有趣。对于那些有趣的子图,使用信息论度量来确定它们的有趣程度。在对未见过的分子进行分类时,将其结构与每个类别的有趣子图进行匹配,并根据每个匹配子图的有趣程度确定该未见过的分子被分类到特定类别的总有趣程度度量。使用人工和真实数据集评估了 MISMOC 的性能,结果表明它是一种有效的分子分类方法。

相似文献

1
Discovering interesting molecular substructures for molecular classification.发现分子分类的有趣分子亚结构。
IEEE Trans Nanobioscience. 2010 Jun;9(2):77-89. doi: 10.1109/TNB.2010.2042609.
2
Incremental fuzzy mining of gene expression data for gene function prediction.基于基因表达数据的渐进式模糊挖掘的基因功能预测。
IEEE Trans Biomed Eng. 2011 May;58(5):1246-52. doi: 10.1109/TBME.2010.2047724. Epub 2010 Apr 15.
3
Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction.耦合图、高效算法与B细胞表位预测
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):7-16. doi: 10.1109/TCBB.2013.136.
4
Mining the Enriched Subgraphs for Specific Vertices in a Biological Graph.从生物图谱中特定顶点的富集子图中挖掘信息。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1496-1507. doi: 10.1109/TCBB.2016.2576440. Epub 2016 Jun 7.
5
Subgraph queries by context-free grammars.通过上下文无关语法进行子图查询。
J Integr Bioinform. 2008 Aug 25;5(2):100. doi: 10.2390/biecoll-jib-2008-100.
6
MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.MISAGA:属性图中有趣子图挖掘的算法。
IEEE Trans Cybern. 2018 May;48(5):1369-1382. doi: 10.1109/TCYB.2017.2693558. Epub 2017 Apr 25.
7
Mining coherent dense subgraphs across massive biological networks for functional discovery.在海量生物网络中挖掘连贯密集子图以进行功能发现。
Bioinformatics. 2005 Jun;21 Suppl 1:i213-21. doi: 10.1093/bioinformatics/bti1049.
8
Accurate classification of protein structural families using coherent subgraph analysis.使用相干子图分析对蛋白质结构家族进行准确分类。
Pac Symp Biocomput. 2004:411-22. doi: 10.1142/9789812704856_0039.
9
Molecule kernels: a descriptor- and alignment-free quantitative structure-activity relationship approach.分子内核:一种无描述符和比对的定量构效关系方法。
J Chem Inf Model. 2008 Sep;48(9):1868-81. doi: 10.1021/ci800144y. Epub 2008 Sep 4.
10
An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.一种用于在嘈杂基因表达数据中挖掘重叠共表达模式的迭代数据挖掘方法。
IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14.