AISB 2000: Time for AI and Society

How to train a community of stochastic generative models

Geoffrey Hinton
Gatsby Computational Neuroscience Unit
University College London

It is possible to combine multiple probabilistic models of the same data by multiplying the probabilities together and then renormalizing. This is a very efficient way to model data which simultaneously satisfies many different constraints. Each individual model can focus on giving high probability to data vectors that satisfy just one of the constraints. Data vectors that satisfy this one constraint but violate other constraints will be ruled out by their low probability under the other models. For example, one model can generate images that have the approximate overall shape of the digit 2 and other more local models can ensure that local image patches contain segments of stroke with the correct fine structure. Or one model of a word string can ensure that the tenses agree and another can ensure that the number agrees. Training a product of models appears difficult because, in addition to maximizing the probabilities that the individual models assign to the observed data, it is necessary to minimize the normalization term by making them disagree on unobserved data. Fortunately, there is an efficient way to train a product of models. Some examples of product models trained in this way will be described, and I will show that they extract interesting structure from images.

last modified 20th January 2000

The contents of this web page are not covered by the AISB creative commons licence.