Alghamdi Elham, Rushe Ellen, Mac Namee Brian, Greene Derek
University College Dublin, Dublin, Ireland.
Appl Netw Sci. 2020;5(1):98. doi: 10.1007/s41109-020-00340-9. Epub 2020 Dec 11.
In many real applications of semi-supervised learning, the guidance provided by a human oracle might be "noisy" or inaccurate. Human annotators will often be imperfect, in the sense that they can make subjective decisions, they might only have partial knowledge of the task at hand, or they may simply complete a labeling task incorrectly due to the burden of annotation. Similarly, in the context of semi-supervised community finding in complex networks, information encoded as pairwise constraints may be unreliable or conflicting due to the human element in the annotation process. This study aims to address the challenge of handling noisy pairwise constraints in overlapping semi-supervised community detection, by framing the task as an outlier detection problem. We propose a general architecture which includes a process to "clean" or filter noisy constraints. Furthermore, we introduce multiple designs for the cleaning process which use different type of outlier detection models, including autoencoders. A comprehensive evaluation is conducted for each proposed methodology, which demonstrates the potential of the proposed architecture for reducing the impact of noisy supervision in the context of overlapping community detection.
在半监督学习的许多实际应用中,人类专家提供的指导可能是“有噪声的”或不准确的。从某种意义上说,人工标注者往往并不完美,因为他们会做出主观决策,可能对手头的任务只有部分了解,或者由于标注负担,他们可能只是错误地完成了标注任务。同样,在复杂网络中的半监督社区发现的背景下,由于标注过程中的人为因素,编码为成对约束的信息可能不可靠或相互冲突。本研究旨在通过将该任务构建为一个离群值检测问题,来应对在重叠半监督社区检测中处理有噪声成对约束的挑战。我们提出了一种通用架构,其中包括一个“清理”或过滤有噪声约束的过程。此外,我们为清理过程引入了多种设计,这些设计使用不同类型的离群值检测模型,包括自动编码器。对每种提出的方法进行了全面评估,这证明了所提出的架构在重叠社区检测的背景下减少有噪声监督影响的潜力。