耦合产生

Coupled generation.

作者信息

Dai Ben, Shen Xiaotong, Wong Wing

机构信息

School of Statistics, University of Minnesota, Minneapolis, MN 55455.

Department of Statistics and Biomedical Data Science, Stanford University, CA 94305.

出版信息

J Am Stat Assoc. 2022;117(539):1243-1253. doi: 10.1080/01621459.2020.1844719. Epub 2021 Jan 4.

DOI:10.1080/01621459.2020.1844719

PMID:36465716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9718422/

Abstract

Instance generation creates representative examples to interpret a learning model, as in regression and classification. For example, representative sentences of a topic of interest describe the topic specifically for sentence categorization. In such a situation, a large number of unlabeled observations may be available in addition to labeled data, for example, many unclassified text corpora (unlabeled instances) are available with only a few classified sentences (labeled instances). In this article, we introduce a novel generative method, called a coupled generator, producing instances given a specific learning outcome, based on indirect and direct generators. The indirect generator uses the inverse principle to yield the corresponding inverse probability, enabling to generate instances by leveraging an unlabeled data. The direct generator learns the distribution of an instance given its learning outcome. Then, the coupled generator seeks the best one from the indirect and direct generators, which is designed to enjoy the benefits of both and deliver higher generation accuracy. For sentence generation given a topic, we develop an embedding-based regression/classification in conjuncture with an unconditional recurrent neural network for the indirect generator, whereas a conditional recurrent neural network is natural for the corresponding direct generator. Moreover, we derive finite-sample generation error bounds for the indirect and direct generators to reveal the generative aspects of both methods thus explaining the benefits of the coupled generator. Finally, we apply the proposed methods to a real benchmark of abstract classification and demonstrate that the coupled generator composes reasonably good sentences from a dictionary to describe a specific topic of interest.

摘要

实例生成用于创建具有代表性的示例以解释学习模型，回归和分类中皆是如此。例如，感兴趣主题的代表性句子会专门针对句子分类来描述该主题。在这种情况下，除了有标记数据之外，可能还存在大量未标记的观测值，例如，有许多未分类的文本语料库（未标记实例），而只有少数已分类的句子（有标记实例）。在本文中，我们介绍一种新颖的生成方法，称为耦合生成器，它基于间接生成器和直接生成器，在给定特定学习结果的情况下生成实例。间接生成器使用逆原理来产生相应的逆概率，从而能够通过利用未标记数据来生成实例。直接生成器在给定学习结果的情况下学习实例的分布。然后，耦合生成器从间接生成器和直接生成器中寻找最佳的那个，其设计目的是兼具两者的优点并提供更高的生成精度。对于给定主题的句子生成，我们结合用于间接生成器的无条件递归神经网络开发了一种基于嵌入的回归/分类方法，而条件递归神经网络对于相应的直接生成器来说是很自然的选择。此外，我们推导了间接生成器和直接生成器的有限样本生成误差界，以揭示这两种方法的生成特性，从而解释耦合生成器的优点。最后，我们将所提出的方法应用于摘要分类的实际基准测试，并证明耦合生成器能够从字典中组合出合理的好句子来描述感兴趣的特定主题。

相似文献

Coupled generation.

J Am Stat Assoc. 2022;117(539):1243-1253. doi: 10.1080/01621459.2020.1844719. Epub 2021 Jan 4.

Enhancing Text Generation via Parse Tree Embedding.

Comput Intell Neurosci. 2022 Jun 10;2022:4096383. doi: 10.1155/2022/4096383. eCollection 2022.

Embedding Learning.

J Am Stat Assoc. 2022;117(537):307-319. doi: 10.1080/01621459.2020.1775614. Epub 2020 Jul 20.

Fast and scalable neural embedding models for biomedical sentence classification.

BMC Bioinformatics. 2018 Dec 22;19(1):541. doi: 10.1186/s12859-018-2496-4.

Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization.

Sensors (Basel). 2022 Dec 26;23(1):249. doi: 10.3390/s23010249.

Adversarial active learning for the identification of medical concepts and annotation inconsistency.

J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

EHR-HGCN: An Enhanced Hybrid Approach for Text Classification Using Heterogeneous Graph Convolutional Networks in Electronic Health Records.

IEEE J Biomed Health Inform. 2024 Mar;28(3):1668-1679. doi: 10.1109/JBHI.2023.3346210. Epub 2024 Mar 6.

IDA-MIL: Classification of Glomerular with Spike-like Projections via Multiple Instance Learning with Instance-level Data Augmentation.

Comput Methods Programs Biomed. 2022 Oct;225:107106. doi: 10.1016/j.cmpb.2022.107106. Epub 2022 Sep 2.

Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning.

IEEE Trans Cybern. 2024 Feb;54(2):890-902. doi: 10.1109/TCYB.2022.3156367. Epub 2024 Jan 17.

LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators.

Sensors (Basel). 2022 Nov 13;22(22):8761. doi: 10.3390/s22228761.

本文引用的文献

Embedding Learning.

J Am Stat Assoc. 2022;117(537):307-319. doi: 10.1080/01621459.2020.1775614. Epub 2020 Jul 20.

Error bounds for approximations with deep ReLU networks.

Neural Netw. 2017 Oct;94:103-114. doi: 10.1016/j.neunet.2017.07.002. Epub 2017 Jul 13.

Deep Visual-Semantic Alignments for Generating Image Descriptions.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):664-676. doi: 10.1109/TPAMI.2016.2598339. Epub 2016 Aug 5.

The Effect of Model Misspecification on Semi-Supervised Classification.

IEEE Trans Pattern Anal Mach Intell. 2011 Oct;33(10):2093-103. doi: 10.1109/TPAMI.2011.45. Epub 2011 Mar 10.

Long short-term memory.

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

耦合产生

Coupled generation.

作者信息

Dai Ben, Shen Xiaotong, Wong Wing

机构信息

School of Statistics, University of Minnesota, Minneapolis, MN 55455.

Department of Statistics and Biomedical Data Science, Stanford University, CA 94305.

出版信息

J Am Stat Assoc. 2022;117(539):1243-1253. doi: 10.1080/01621459.2020.1844719. Epub 2021 Jan 4.

DOI:10.1080/01621459.2020.1844719

PMID:36465716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9718422/

Abstract

摘要

耦合产生

Coupled generation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

耦合产生

Coupled generation.

作者信息

机构信息

出版信息