Cid-Sueiro J, Arribas J I, Urbán-Muñoz S, Figueiras-Vidal A R
Departamento de Teoría de la Señal y Comunicaciones e Ing. Telemática, ETSIT, Universidad de Valladolid, Campus Miguel Delibes s/n, 47011 Valladolid, Spain.
IEEE Trans Neural Netw. 1999;10(3):645-56. doi: 10.1109/72.761724.
The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in one-class one-output networks whose outputs are consistent with probability laws. We focus our attention on a particular subset of the corresponding cost functions; those which verify two usually interesting properties: symmetry and separability (well-known cost functions, such as the quadratic cost or the cross entropy are particular cases in this subset). Finally, we present a universal stochastic gradient learning rule for single-layer networks, in the sense of minimizing a general version of these cost functions for a wide family of nonlinear activation functions.