Weighted contrastive divergence
Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluat...
| Autores: | , , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2019 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/133368 |
| Acceso en línea: | https://hdl.handle.net/2117/133368 https://dx.doi.org/10.1016/j.neunet.2018.09.013 |
| Access Level: | acceso abierto |
| Palabra clave: | Neural networks (Computer science) Machine learning Restricted Boltzmann machine Contrastive divergence Xarxes neuronals (Informàtica) Aprenentatge automàtic Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial |
| Sumario: | Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to the gradient has several drawbacks. Overcoming these defects has been the basis of much research and new algorithms have been devised, such as persistent CD. In this manuscript we propose a new algorithm that we call Weighted CD (WCD), built from small modifications of the negative phase in standard CD. However small these modifications may be, experimental work reported in this paper suggests that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational cost. |
|---|