Wii3D | Project >> Background >> Gesture Recognition

Do it like this

Gesture Recognition

Humans use gestures, especially hand gestures, for day to day communication. The gestures that humans use are ingrained from childhood, and gestures therefore have an inherently high level of intuitiveness. The use of gestures in computer software gives the user the ability to interact with a computer in a more natural and intuitive fashion.

Fu, who wrote a paper on statistical pattern classification, stated that:

The problem of pattern recognition usually denotes a discrimination or classification of events.

A gesture recognizer uses the spatiotemporal changes as the gesture progresses for its discrimination/classification process. A recognizer generally has three components:

Encoding - the representation of the gesture
Classification - the injection of the supported gestures into the recognizer using ideal situations and randomization or by example
Recognition - the matching of observations to gestures

Encoding

The encoding (representation) of a gesture is important as the optimal transformation can result in a very efficient system. The recognition of shapes in gesture recognition can be achieved by encoding the data in such a way that the resultant encoding is a pattern that matches a specific gesture. The separation of gestures into categories, or families, allows a system to represent that family of gestures based on common features of the family.

Size Functions

Frosini proposed a theory that sign language could be recognized by representing the shapes using size functions. The recognition of sign language has been successfully implemented by these representations.

A size function is generated by mapping the observations to some measurement system. The importance of choosing the correct measuring system is evident in the possibility of mapping different gestures to the same size function. The possibilities of observed gesture data are inherently infinite and the ability of the classifier to recognize a gesture depends on the size function's ability to map the 'same' gestures to the same encoding. Due to the physical nature of the system, the 'same' gesture can vary, globalized and localized, in its displacement, rotation and scale. These variances need to be taken into account by a process of normalization of the gesture. A simple example of a size function is a mapping of all of the gesture's coordinates' distances from a reference point.

Approximate Directional Vectors

The approximate directional vectors are simply the directional vectors between two points rounded to the predefined principal directions

Finite State Automata

The use of Finite State Automata for template matching is the simplest approach to recognizing gestures. The gestures are recognized by simply comparing the observed values with the template values, and transitions between the states to a known output results in a gesture being recognized.

Hidden Markov Models

A three dimensional gesture may be recognised and processed using a Hidden Markov Model, which is a simple dynamic Bayesian network. This spatiotemporal model reduces the 3 dimensional complexity of the hand gesture into a two dimensional problem, and analyses and categorizes these gestures using a state machine. This model has been employed, with success, in speech and handwriting recognition.

Learning

The learning task of the Hidden Markov Model is an intractable problem that uses a maximum likelihood approach to determine the best set of state transition and output probabilities, given an output set of sequences. However, the Baum-Welch algorithm is often used to efficiently derive a local maximum likelihood.

Recognition

The recognition process tackles the problem of deciding whether an observed set can be described by the Hidden Markov Model. This is achieved by calculating the probability of the observed set, given the parameters of the model. If this probability is above a predefined threshold, the gesture is recognized as part of the model.

Other Methods

Artificial Neural Networks

Artificial neural networks are computational models that simulate aspects of a biological neural network. Artificial neural networks are built up as a collection of node layers:

An input layer
A series of hidden layers
An output layer

The outputs between states are based on weights are learnt using back propagation through examples. The neural network does not generally perform as well as the more specialized Hidden Markov Models, but this approach has been used with some success in the recognition of sign language gestures in a paper demonstrating the use of size functions used an artificial neural network approach.

Statistical

Statistical methods of gesture recognition use classifiers, just as the Hidden Markov Models do. Other approaches include:

Bayesian classifiers
Hidden Markov Model with Gaussian distributions

Due to the requirement of large training sets and the inflexibility of Hidden Markov Models classification, the Bayesian approach might be preferable, as a study in head gesture recognition showed. The sparse classification model used in that study demonstrated the flexibility of the Bayesian approach.

The Hidden Markov Models can be modified to emit continuous distributions, which would be useful for continuous data rather than transforming the observed values into discrete observations.

<< Back to Project

The Wii3D Project Background