This paper describe a model of object recognition as machine translation. 3 issues are addressed in this paper:
(1) What counts as an object?
(2) Which objects are easy to recognise?
(3) Which objects are indistinguishable using our features?
By viewing the object recognition problem as machine translation (i.e. recognition is a process of annotating image regions with words). This paper attack these 3 questions with the following answers respectively:
(1) All words count as objects.
(2) Words that can be reliably attached to image regions are easy to recognise and those that cannot, are not.
(3) Words that are predicted with about the same posterior probability given any image group - such objects are indistinguishable given the current feature set.
For training this model, first, segment images into regions. Second, classify regions into region types using a variety of features. Last, learn a mapping between region types and keywords supplied with the images using EM.
沒有留言:
張貼留言