Statistical Analysis of Massive Data Streams: Proceedings of a Workshop (2004)

Chapter: II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES

Previous Chapter: TRANSCRIPT OF PRESENTATION
Suggested Citation: "II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.

Incorporating invariants in Mahalanobis distance based classifiers: Application to Face Recognition

Andrew M.Fraser

Portland State University and Los Alamos National Laboratory

Nicolas W.Hengartner, Kevin R.Vixie, and Brendt E.Wohlberg

Los Alamos National Laboratory

Los Alamos, NM 87545

USA

Abstract—We present a technique for combining prior knowledge about transformations that should be ignored with a covariance matrix estimated from training data to make an improved Mahalanobis distance classifier. Modern classification problems often involve objects represented by high-dimensional vectors or images (for example, sampled speech or human faces). The complex statistical structure of these representations is often difficult to infer from the relatively limited training data sets that are available in practice. Thus, we wish to efficiently utilize any available a priori information, such as transformations of the representations with respect to which the associated objects are known to retain the same classification (for example, spatial shifts of an image of a handwritten digit do not alter the identity of the digit). These transformations, which are often relatively simple in the space of the underlying objects, are usually non-linear in the space of the object representation, making their inclusion within the framework of a standard statistical classifier difficult. Motivated by prior work of Simard et al., we have constructed a new classifier which combines statistical information from training data and linear approximations to known invariance transformations. When tested on a face recognition task, performance was found to exceed by a significant margin that of the best algorithm in a reference software distribution.

I. INTRODUCTION

The task of identifying objects and features from image data is central in many active research fields. In this paper we address the inherent problem that a single object may give rise to many possible images, depending on factors such as the lighting conditions, the pose of the object, and its location and orientation relative to the camera. Classification should be invariant with respect to changes in such parameters, but recent empirical studies [1] have shown that the variation in the images produced from these sources for a single object are often of the same order of magnitude as the variation between different objects.

Inspired by the work of Simard et al. [2] [3], we think of each object as generating a low dimensional manifold in image space by a group of transformations corresponding to changes in position, orientation, lighting, etc. If the functional form the transformation group is known, we could in principle calculate the entire manifold associated with a given object from a single image of it. Classification based on the entire manifold, instead of a single point leads to procedures that will be invariant to changes in instances from that group of transformations. The procedures we describe here approximate such a classification of equivalence classes of images. They are quite general and we expect them to be useful in the many contexts outside of face recognition and image processing where the problem of transformations to which classification should be invariant occur. For example, they provide a framework for classifying near field sonar signals by incorporating Doppler effects in an invariant manner. Although the procedures are general, in the remainder of the paper, we will use the terms faces or objects and image classification for concreteness.

Of course, there are difficulties. Since the manifolds are highly nonlinear, finding the manifold to which a new point belongs is computationally expensive. For noisy data, the computational problem is further compounded with the uncertainty in the assigned manifold.

To address these problems, we use tangents to the manifolds at selected points in image space. Using first and second derivatives of the transformations, our procedures provide substantial improvements to current image classification methods.

II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES

Here we outline our approach. For a more detailed development, see [4]. We start with the standard Mahalanobis distance classifier

where Cw is the within class covariance for all of the classes, μk is the mean for class k, and Y is the image to be classified. We incorporate the known invariances while retaining this classifier structure by augmenting the within class covariance Cw to obtain class specific covariances, Ck for each class k. We design the augmentations to allow excursions in directions tangent to the manifold generated by the transformations to which the classifier should be invariant. We have sketched a geometrical view of our approach in Fig. 1.

Denote the transformations with respect to which invariance is desired by τ(Y, θ), where and are the image and transform parameters respectively. The second order Taylor series for the transformation is

where R is the remainder,

Suggested Citation: "II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.

Fig. 1. A geometrical view of classification with augmented covariance matrices: The dots represent the centers μk about which approximations are made, the curves represent the true invariant manifolds, the straight lines represent tangents to the manifolds, and the ellipses represent the pooled within class covariance Cw estimated from the data. A new observation Y is assigned to a class using The novel aspect is our calculation of where α is a parameter corresponding to a Lagrange multiplier, and is a function of the tangent and curvature of the manifold (from the first and second derivatives respectively) with weighting of directions according to relevance estimated by diagonalizing Cw.

We define

(1)

Where Cθ,k is a dim(Θ)×dim(Θ) matrix. We require that Cθ,k be non-negative definite. Consequently is also non-negative definite. When is used as a metric, the effect of the term is to discount displacement components in the subspace spanned by Vk, and the degree of the discount is controlled by Cθ,k. We developed [4] our treatment of Cθ,k by thinking of θ as having a Gaussian distribution and calculating expected values with respect to its distribution. Here we present some of that treatment, minimizing the probabilistic interpretation. Roughly, Cθ,k characterizes the costs of excursions of θ. We choose Cθ,k to balance the conflicting goals

Big:

We want to allow θ to be large so that we can classify images with large displacements in the invariant directions.

Small:

We want to be small so that the truncated Taylor series will be a good approximation.

We search for a resolution of these conflicting goals in terms of a norm on θ and the covariance Cθ,k. For the remainder of this section let us consider a single individual k and drop the extra subscript, i.e., we will denote the covariance of θ for this individual by Cθ.

If, for a particular image component d, the Hessian Hd has both a positive eigenvalue λ1 and a negative eigenvalue λ2, then the quadratic term θTHθ is zero along a direction e0 which is a linear combination of the corresponding eigenvectors, i.e. We suspect that higher order terms will contribute to significant errors when γ≥ min so we eliminate the canceling effect by replacing Hd with its positive square root, i.e. if an eigenvalue λ of Hd is negative, replace it with −λ. This suggests the following mean root square norm

(2)

Consider the following objection to the norm in Eqn. (2). If there is an image component d which is unimportant for recognition and for which Hd is large, e.g. a sharp boundary in the background, then requiring to be small might prevent parameter excursions that would only disrupt the background. To address this objection, we use the eigenvalues of the pooled within class covariance matrix Cw to quantify the importance of the components. If there is a large within class variance in the direction of component d, we will not curtail particular parameter excursions just because they cause errors in component d.

We develop our formula for Cθ in terms of the eigendecomposition

as follows. Break the dim(Θ)×dim(γ)×dim(Θ) tensor H into components

(3)

Then for each component, define the dim(Θ)×dim(Θ) matrix

(4)

and take the average to get

(5)

Define the norm

Suggested Citation: "II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
Page 184
Suggested Citation: "II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
Page 185
Next Chapter: III. FACE RECOGNITION RESULTS
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.