This paper propose a dataset which is more realistic than usual face recognition datasets, because it contains faces captured "in the wild" in a variety of configurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. Each face image is associated with a set of names, automatically extracted from the associated caption. Many, but not all such sets contain the correct name.
This paper shows quite good face clustering is possible for this dataset which has inaccurately and ambiguously labelled face images. The approach used in this paper is focus on adopting the kPCA/LDA methodology, rather than on building a multi-class classifier to do face recognition.
2009年4月30日 星期四
2009年4月29日 星期三
[Reading] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
This paper describe a model of object recognition as machine translation. 3 issues are addressed in this paper:
(1) What counts as an object?
(2) Which objects are easy to recognise?
(3) Which objects are indistinguishable using our features?
By viewing the object recognition problem as machine translation (i.e. recognition is a process of annotating image regions with words). This paper attack these 3 questions with the following answers respectively:
(1) All words count as objects.
(2) Words that can be reliably attached to image regions are easy to recognise and those that cannot, are not.
(3) Words that are predicted with about the same posterior probability given any image group - such objects are indistinguishable given the current feature set.
For training this model, first, segment images into regions. Second, classify regions into region types using a variety of features. Last, learn a mapping between region types and keywords supplied with the images using EM.
(1) What counts as an object?
(2) Which objects are easy to recognise?
(3) Which objects are indistinguishable using our features?
By viewing the object recognition problem as machine translation (i.e. recognition is a process of annotating image regions with words). This paper attack these 3 questions with the following answers respectively:
(1) All words count as objects.
(2) Words that can be reliably attached to image regions are easy to recognise and those that cannot, are not.
(3) Words that are predicted with about the same posterior probability given any image group - such objects are indistinguishable given the current feature set.
For training this model, first, segment images into regions. Second, classify regions into region types using a variety of features. Last, learn a mapping between region types and keywords supplied with the images using EM.
[Reading] Algorithms for Fast Vector Quantization
Finding the nearest neighbor (NN) is a problem of significant importance in many applications. One important application is vector quantization, a technique used in the compression of speech and images. If one is willing to relax the requirement of finding the true NN, this paper shows that it is possible to achieve significant improvements in running time and at only a very small loss in the performance of the vector quantizer.
This paper present an empirical study of 3 NN algorithms on a number of data distributions, and in dimensions varying from 8 to 16.
(1) Standard k-d tree algorithm, which has been enhanced to use incremental distance calculation.
(2) Priority k-d tree search, a further improvement that orders search by the proximity of the k-d cell to the query point.
(3) A neighborhood graph search algorithm, based on a simple greedy search.
This paper present an empirical study of 3 NN algorithms on a number of data distributions, and in dimensions varying from 8 to 16.
(1) Standard k-d tree algorithm, which has been enhanced to use incremental distance calculation.
(2) Priority k-d tree search, a further improvement that orders search by the proximity of the k-d cell to the query point.
(3) A neighborhood graph search algorithm, based on a simple greedy search.
2009年4月1日 星期三
[Reading] Latent Dirichlet Allocation
latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus that allows sets of observations to be explained by unobserved groups which explain why some parts of the data are similar. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. It assumes that words are generated by topics and that those topics are infinitely exchangeable within a document, i.e. axchangeability assumption.
It uses variational EM to estimate parameters. Also, it introduces Dirichlet smoothing to avoid the "zero frequency problem" called smoothed LDA. Exact inference is intractable for LDA, but any or a large suite of approximate inference algorithms for inference and parameter estimation can be used with the LDA framework. It Can be viewed as a dimensionality reduction technique.
It uses variational EM to estimate parameters. Also, it introduces Dirichlet smoothing to avoid the "zero frequency problem" called smoothed LDA. Exact inference is intractable for LDA, but any or a large suite of approximate inference algorithms for inference and parameter estimation can be used with the LDA framework. It Can be viewed as a dimensionality reduction technique.
訂閱:
文章 (Atom)