Frank's aMMAI: 2009

2009年6月19日星期五

[Reading] Support Vector Learning for Ordinal Regression

"Learning to rank" is automatically creating a ranking function that assigns scores to instances, then rank the instances by using the scores.

This paper formalizes learning to rank as a problem of binary classification, and uses SVM (support vector machine) to learn the binary classifier. This formulation minimizes pair-wise 0-1 loss.

The learned ranking function can be viewed as (1)Ranking function: given an example, output its ranking score. (2)Classifier: given a pair of instances, output their relative ranking.

[Reading] The Structure and Function of Complex Networks

This paper reviews recent work on the structure and function of networked systems such as the Internet, the World Wide Web, social networks, networks of citations between papers, and many others. The study of networks, in the form of mathematical graph theory, is one of the fundamental pillars of discrete mathematics. Networks have also been studied extensively in the social sciences.

This paper mainly describes three parts:
(1) Empirical studies of the structure of networks, including social networks, information networks, technological networks and biological networks.
(2) Some of the common properties that are observed in many of these networks, how they are measured, and why they are believed to be important for the functioning of networked systems.
(3) The mathematical modeling of networks, including random graph models and their generalizations, exponential random graphs, and Markov graphs, the small-world model and its variations, and models of growing graphs including preferential attachment models and their many variations.

[Reading] Lazy Snapping

This paper presents "Lazy Snapping", an interactive image cutout tool, also a novel coarse-to-fine UI design for image cutout. The task in image cutout is in specifying which parts of the image are "foreground" (the part you want to cut out) and which belong to the background.

Lazy Snapping consists of two steps, both are formulated as a graph cut problem:
(1) a quick object marking step
Object marking (at a coarse scale) specifies the object of interest by a few marking lines. This step is intuitive and quick for object context specification. An efficient graph cut algorithm is proposed by employing pre-computed over-segmentation so that the marking UI can provide instant visual feedback for users.
(2) a simple boundary editing step
Boundary editing (at a finer scale or on the zoomed-in image) allows the user to edit the object boundary by simply clicking and dragging polygon vertices, and use the polygon locations as soft constraints to improve snapping results around ambiguous or low contrast edges. This step is easy and efficient for accurate boundary control.

2009年6月5日星期五

[Reading] Learning Low-Level Vision

This paper presents a learning-based method for low-level vision problems - estimating underlying scenes from images, which is a combination themes of scene estimation and statistical learning. The estimates of underlying scenes are important for various tasks in image analysis, database search, and robotics.

This approach is called VISTA - Vision by Image/Scene TrAining. It is as follows: one specifies prior probabilities on scenes by generating typical examples, creating a synthetic world of scenes and rendered images. It break the images and scenes into a Markov network, and learn the parameters of the network from the training data by applying belief propagation in the Markov network.

Solving a Markov network involves a learning phase, where the parameters of the network connections are learned from training data, and an inference phase, when the scene corresponding to particular image data is estimated.

This paper applies VISTA to the "super-resolution" problem (estimating high frequency details from a low-resolution image), showing good results.

I think the important thing in this paper is that the power of the VISTA approach lies in the large training database, allowing rich prior probabilities, the selection of scene candidates, which focuses the computation on scenes that render to the image, and the bayesian belief propagation, which allows efficient inference.

[Reading] An Introduction to Graphical Models

This technical report gives an introduction to graphical models. it says that graphical models are a marriage between probability theory and graph theory.

graphical models provide a natural tool for dealing with (1) uncertainty and (2) complexity. In particular, graph theory provides the notion of modularity, i.e. a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data.

This graphical model framework provides a way to view several systems as instances of a common underlying formalism, ex: mixture models, factor analysis, hidden Markovmodels, Kalman filters. Probabilistic graphical models are graphs in which nodes represent random variables, and the arcs represent conditional independence assumptions. They provide a compact representation of joint probability distributions.

There are two main kinds of graphical models: undirected and directed. In a directed graphical model (a Bayesian network), an arc from A to B can be informally interpreted as indicating that A causes B. We can see that the conditional independence relationships allow us to represent the joint more compactly.

Talking about inference, its goal is to estimate the values of hidden nodes, given the values of the observed nodes. In particular, we can use the conditional independence assumptions encoded in the graph to speed up exact inference. The key idea of the variable elimination algorithm (and many others) is to "push" the sums in as far as possible. Also, if we wish to compute several marginals at the same time, we can use dynamic programming to avoid the redundant computation that would be involved if we used variable elimination repeatedly. The reason why we use approximate inference is that the running time of these exact algorithms is exponential in the size of the largest cluster , and minimizing it is NP-hard. When it is large, it is necessary to use approximate inference.

2009年6月2日星期二

[Reading] Rapid Object Detection using a Boosted Cascade of Simple Features

This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. It uses haar features for weak learners, by using the "integral image"
technique, those features can be computed very very quickly. It uses adaboost as learning algorithm. It selects a small number of most important features from a larger set and yields extremely efficient, discriminative classifiers. It propose a "cascade" framework for providing efficiently distiguishing between face and nonface. Overall, this paper propose an approach for object detection which minimizes computation time while achieving high detection accuracy.

[Reading] Normalized Cuts and Image Segmentation

The approach of this paper aims at extracting the global impression of an image and provides a hierarchical description of it. It is most related to the graph theoretic formulation of grouping. By treating the grouping problem (image segmentation) as a graph partitioning problem, this paper proposed the normalized cut criteria for segmenting the graph.

Normalized cut is an unbiased measure of disassociation between subgroups of a graph and it has the nice property that minimizing normalized cut leads directly to maximizing the normalized association, which is an unbiased measure for
total association within the subgroups. it also avoids the problem that unnatural bias for partitioning out small sets of points.

minimizing normalized cut exactly is NP-complete. This paper shows that, when it embed the normalized cut problem in the real value domain, an approximate
discrete solution can be found efficiently. it is formulated as a
generalized eigenvalue problem.

2009年5月5日星期二

[Reading] On Spectral Clustering: Analysis and an algorithm

Spectrual clustering methods are algorithms that cluster points using eigenvectors of matrices derived from the data. Essentially, it is K means in the eigenvector space of the affinity matrix.

This paper present a simple spectral clustering algorithm and analyze it. It provides a theoretical analysis unlike previous works are empirical.

This method provides 4 elasticities for user to control the clustering:
(1) Affinity matrix construction (usually Gaussian kernel)
(2) Choice of scaling factor (it can be done by search over and pick value that gives the tightest clusters)
(3) Choice of k, the number of clusters
(4) Choice of clustering method

2009年4月30日星期四

[Reading] Names and Faces in the News Abstract

This paper propose a dataset which is more realistic than usual face recognition datasets, because it contains faces captured "in the wild" in a variety of configurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. Each face image is associated with a set of names, automatically extracted from the associated caption. Many, but not all such sets contain the correct name.

This paper shows quite good face clustering is possible for this dataset which has inaccurately and ambiguously labelled face images. The approach used in this paper is focus on adopting the kPCA/LDA methodology, rather than on building a multi-class classifier to do face recognition.

2009年4月29日星期三

[Reading] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

This paper describe a model of object recognition as machine translation. 3 issues are addressed in this paper:
(1) What counts as an object?
(2) Which objects are easy to recognise?
(3) Which objects are indistinguishable using our features?

By viewing the object recognition problem as machine translation (i.e. recognition is a process of annotating image regions with words). This paper attack these 3 questions with the following answers respectively:
(1) All words count as objects.
(2) Words that can be reliably attached to image regions are easy to recognise and those that cannot, are not.
(3) Words that are predicted with about the same posterior probability given any image group - such objects are indistinguishable given the current feature set.

For training this model, first, segment images into regions. Second, classify regions into region types using a variety of features. Last, learn a mapping between region types and keywords supplied with the images using EM.

[Reading] Algorithms for Fast Vector Quantization

Finding the nearest neighbor (NN) is a problem of significant importance in many applications. One important application is vector quantization, a technique used in the compression of speech and images. If one is willing to relax the requirement of finding the true NN, this paper shows that it is possible to achieve significant improvements in running time and at only a very small loss in the performance of the vector quantizer.

This paper present an empirical study of 3 NN algorithms on a number of data distributions, and in dimensions varying from 8 to 16.
(1) Standard k-d tree algorithm, which has been enhanced to use incremental distance calculation.
(2) Priority k-d tree search, a further improvement that orders search by the proximity of the k-d cell to the query point.
(3) A neighborhood graph search algorithm, based on a simple greedy search.

2009年4月1日星期三

[Reading] Latent Dirichlet Allocation

latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus that allows sets of observations to be explained by unobserved groups which explain why some parts of the data are similar. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. It assumes that words are generated by topics and that those topics are infinitely exchangeable within a document, i.e. axchangeability assumption.

It uses variational EM to estimate parameters. Also, it introduces Dirichlet smoothing to avoid the "zero frequency problem" called smoothed LDA. Exact inference is intractable for LDA, but any or a large suite of approximate inference algorithms for inference and parameter estimation can be used with the LDA framework. It Can be viewed as a dimensionality reduction technique.

2009年3月31日星期二

[Reading] Probabilistic Latent Semantic Indexing

pLSA is a novel approach to automated document indexing and information retrieval. It models each word in a document as a sample from a mixture model. Each word is generated from a single topic, different words in the document may be generated from different topics. Each document is represented as a list of mixing proportions for the mixture components.

pLSA is based on the likelihood principle and uses a statistical model called aspect model to define a proper generative model of the data, and directly minimizes word perplexity, so it has a better statistical foundation than LSA. Also, pLSA outperforms LSA in the experiments. pLSA uses EM algorithm to identify latent classes. It is capable of dealing with polysemy and synonymy.

2009年3月25日星期三

[Reading] Shape Matching and Object Recognition Using Shape Contexts

This paper propose a robust and simple algorithm for finding correspondences and measure the similarity between shapes and exploit it for object recognition. This approach is a 3-stage process: (1) Find correspondences between points on shapes, (2) Estimate transformation, and (3) Measure similarity. In order to solve the correspondence problem, it propose a descriptor named shape context. Shape context records the distribution of relative positions of points. the estimation use regularized thin plate spline as transformation model. Shape distance is a weighted sum of shape context distance, appearance distance and bending energy. Results are presented for handwritten digits, 3D objects, silhouettes and trademarks.

[Reading] Contour and Texture Analysis for Image Segmentation

This paper propose a general algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. It uses texture features for segmentation. A texture descriptor is a vector of filter bank outputs. Textons are found by clustering. Affinities are given by similarities of texton histograms over windows given by the "local scale" of the texture. Having get a locall measure, it use the spectral graph theoretic framework of normalized cuts to find partitions.

2009年3月10日星期二

[Reading] Nonlinear Dimensionality Reduction by Locally Linear Embedding

This paper introduce locally linear embedding (LLE), an unsupervised learning algorithm that
computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs.
LLE recovers global nonlinear structure from locally linear fits by exploiting the local symmetries of linear reconstructions, thus LLE is able to learn the global structure of nonlinear manifolds.

LLE maps high-dimensional data into a single global coordinate system of lower dimensionality. It constructs a neighborhood-preserving mapping based on reconstructing the constrained weights. By minimize the reconstruction errors, these weights reflect intrinsic geometric properties of the data that are invariant to rotations, rescalings, and translations.

This approach eliminates the need to estimate pairwise distances between widely separated data points. It also avoids the need to solve large dynamic programming problems.

[Reading] Eigenfaces for Recognition

Eigenfaces are a set of eigenvectors used in the computer vision problem of human face recognition. The eigenvectors of the covariance matrix associate to a large set of normalized pictures of faces are called eigenfaces. They are derived from the covariance matrix of the probability distribution of the high-dimensional vector space of possible faces of human beings. This approach is an example of principal components analysis.

2009年3月9日星期一

[Reading] Scale & Affine Invariant Interest Point Detectors

This paper propose a novel approach for detecting interest points especially invariant to scale and affine transformaitons. Scale invariant detector computes a multi-scale representation for the Harris interest point detector and then selects points at which a local measure (Laplacian) is maximal over scales, such approach combines the Harris detector with the Laplacian-based scale selection. It extends the scale invariant detector to affine invariance by estimating the affine shape of a point neighborhood. This method modifies location, scale and shape of every point neighborhood and converges to affine invariant points.

[Reading] Distinctive Image Features from Scale-Invariant Keypoints

This paper presents a method named SIFT for extracting distinctive invariant features (named SIFT) from images that providing a basis for object and scene recognition. SIFT is a carefully designed procedure with empirically determined parameters for the invariant and distinctive features.

SIFT has the following four stages (the first two is as a detector, the last two is as a descriptor):
(1) Scale-space extrema detection
Use a DOG function to identify potential interest points that are invariant to scale.
(2) Keypoint localization
Detailed fitting for sub-pixel accuracy and further selection based on stability.
(3) Orientation assignment
In short it is based on gradient directions, so the feature are orientation invariant.
(4) Keypoint descriptor
Create array of orientation histograms.

The SIFT keypoints are invariant to image scale and rotation and robust across a substantial range of affine distortion, addition of noise, and change in illumination.

2009年2月21日星期六

[Reading] Image Retrieval: Ideas, Influences, and Trends of the New Age

I choose to summarize from 3. IMAGE RETRIEVAL TECHNIQUES: ADDRESSING THE CORE PROBLEM (p.14) to 3.2. Image Similarity Using Visual Signature (p.30).

CBIR technology amounts to 2 problems: (a) the design of image description (signature), and (b) the similarity measure between two image descriptions. In the recent years, the design of features and the signatures constructed by these features have much progress. Besides, using machine learning techniques in CBIR has become more popular and also important.

Signature Extraction

Feature extraction is the first step, after we extract features from an image, we need to do signature construction using these features. There are 2 ways to do signature construction: (a) using segmentation as first step, and (b) segmentation-free approach.

To acquire a region-based signature, image segmentation is needed. Several methods proposed to do segmentation on medical images. Segmentation-based approach may have the problem that result is too sensitive to segmentation quality, so several methods tried to solve these problems.

Computing global feature is efficient, but it is insensitive to location. So a better way (also a trend) is to compute local features then summarize them. Several types of local features are discussed such as color, texture, shape, spatial modeling and interesting points.

When the # of features are very large for us to choose, we can use machine learning techniques to do feature selection.

For constructing region-based signature, several methods proposed to do signature construction. Lots of them have a connection with histograms.

Similarity

There are 3 types of signatures: (a) region-based signature, (b) feature vector, and (c) summary of local feature vectors. Different types of signatures have different similarity measures. For (a), the definition of distance between “set of vectors” is crucial. For (b), several recent efforts have been made to measure the distance on a manifold, because using geodesic as distance measure is more reasonable. For (c), codebooks and probability density functions have been used as signatures.

For region-based signature, there are basically 2 formulations to compute similarity, one is using the sum of weighted pair-wise distance as formulation, different constraints lead to different design of weights. Another approach is using Hausdorff distance. Recently, several improvements have been made including feature tuning, weight computation, robustness against inaccurate segmentation and speeding-up retrieval.

For feature vector, computation of similarity is performed nonlinearly along the manifold, typical methods are locally-linear embedding (LLE), isomapping, and multidimensional scaling.

[Reading] How to give a good research talk

This artical gives suggestions about giving a presentation of 30-60 minutes. Because it says that "make what is useful for you, and ignore the rest", I only summarize the parts useful to me.
First, use examples is important. Always remember to illustrate an idea (theorem, definiiton, ...) WITH an example.
Second, treat the more important aspects in more detail than others. Also, don't read your slides, talk ABOUT what's on it.
Last, avoid too much introduction such as previous work. Also, sometimes give outline of your talk is not appropriate.

[Reading] How to Read a Paper

This paper propose a 3-pass method for reading papers.
(1) The 1st pass (5~10min) gives you a general idea by answering the 5 Cs to yourself.
(2) The 2nd pass (1hr) lets you grasp the content but not detail. At this stage you should be able to do summarization!
(3) The 3rd pass (4~5hr) helps you understand the paper in depth. The key is to attempt to "virtually re-implement it".

This paper also describe how to use the proposed method to do a survey by 3 steps.
(1) Use search engine and read "RELATED WORK".
(2) Find key citations and key researchers's recent publication.
(3) Quickly scan the top conferences' recent papers.

訂閱：文章 (Atom)

2009年6月19日 星期五

2009年6月5日 星期五

2009年6月2日 星期二

2009年5月5日 星期二

2009年4月30日 星期四

2009年4月29日 星期三

2009年4月1日 星期三

2009年3月31日 星期二

2009年3月25日 星期三

2009年3月10日 星期二

2009年3月9日 星期一

2009年2月21日 星期六

網誌存檔