The overall goal of hand gesture recognition is the interpretation of the following: the hand(s) location, posture, or gesture conveys.
Gesture recognition can be conducted in two manners. Either a data glove is used which transforms the body flexions into movement information, or vision-based approach is applied where a camera serves as a human eye to record body positions which are then extracted using image processing [5].
It is clear that the first method might bring precise results.
The vision-based approach, on the other hand, has no other equipment requirements for the end user (except for the camera), making it suitable for general applications [5].
Data glove is rather uncomfortable in terms of user convenience. Also, equipment needed to employ the method would be unacceptably costly for most of standard customers making it only suitable for special use.
The drawback of vision-based method, however, lays in algorithmic complexity where considerable amount of time and computing power is required to extract body movements [5].
There are various algorithms available which focus on different aspects of the gesturing person (and take different assumptions). Generally, they can be divided into two categories, appearance- and 3D model-based approaches. The 3D model-based approach compares the input parameters of a limb with 2D projection of a 3D limb model. The appearance-based approach uses image features to model the visual appearance of a limb and compares it with extracted image features from the video input [5].
In section 4.1 three gesture types were defined. A general classifier is used to detect static gestures (i.e. postures). Under classifier we can understand an element on which arrangement into group or category is being decided. So classifier tells us which gesture or pose was recognized. However, dynamic hand gestures have a temporal aspect and require techniques that handle this dimension, e.g. Hidden Markov Models (HMM). The other way is using motion based model.
Some of the techniques used for static (and dynamic) hand gesture recognition are K-means algorithm, KNN, SVM, already mentioned HMM algorithm, DTW algorithm or neural networks [3].
For faster data processing in gesture recognition process, the whole image area cannot be processed.
After the hand is automatically detected, the area around the hand is allocated which is then only processed to reduce the processing load. The obtained distances is converted into the grayscale image to get the contour of the hand.
Following two described methods for gesture recognition process.
Convexity defects
In the first step hand must be separated from the background. The separation can be provided by using depth information, determining which pixels of image belong to hand. Second base step is to detect contour of hand.
The shapes of many complex objects are well characterized by convexity defects. Fig. 4.3 illustrates the concept of a convexity defect using an image of a human hand. The convex hull is pictured as a dark line around the hand, and the regions labelled A through H are each “defects” relative to that hull. As it can be seen, these convexity defects characterize also the state of the hand. Algorithm returns the coordinates of three points, the start point, deepest point and end point (Fig. 4.4) of the defect, where the deepest point is understood as maximum distance between the hull and hand contour.
The goal of the algorithm is finding the point on the finger that is the farthest from the centre of hand. The first step is to remove all defects whose height is less than a specified value. Then we remove the defects that have a distance between starting point and maximum point more than a specified value, defined dynamically (if two points are too far, it means that they cannot represent a finger). Next also the defects that have a distance between start point and depth point less than a specified value are removed. This value changes dynamically according to the size of the region where the hand is detected (Fig. 4.5). The final step is to remove all the defects that occur below the wrist.
Part-based hand gesture recognition
In the first step of this algorithm, data segments containing hands are obtained and converted into binary image. The centre of the palm (Fig. 4.6) is computed by inner circle and adding a point onto contour hull that belongs to hand and has maximal distance from the found defect.
All points of the hand contour are mapped onto X-axis. Y-axis then describes the relative distance of each point from the centre of the palm. All mapped points create a curve (Fig. 4.7).
Next step of algorithm is the curve analysis with aim to find local maximum. After the curve analysis is finished, finger extraction starts. The distance of each maximum is compared to set threshold. Each finger has a specific weight and based on the defined relations fingers (relation is comparing weight with given values and based on this the number of fingers is known. i.e. if weight < 1,5x mean weight – segment contains one finger, etc.) are detected. This algorithm works also for joined fingers (Fig. 4.8).
Dynamic gestures ensures an access to a private users’ content or to control of a system or an application. Their usage can be used also as a password key. Neural networks and genetic algorithms were mostly used in the beginnings of gesture recognition.
These methods had acceptable recognition rate, but the greatest drawback was the amount of the necessary computing power and time needed for training the neural networks which were significantly and unacceptably high for practical applications.
Nowadays, new techniques are used to recognize gestures. The algorithms which do not require neuron networks have been invented, for example Golden Section Search, Incremental Recognition Algorithm and probabilistic models like Hidden Markov Model. To increase the success rate of mentioned algorithms Machine learning can be used. It exists many approaches how to do gesture recognition. HMM methods are very popular in last years. The main reason is that HMM approach is well known and used in many areas.
The algorithm proposed by Kristensson and Denby [4] originally for digital pen strokes and touch-screen devices can be extended also for dynamic gestures.
For this approach, a template is defined as a set of segments describing the gesture. It is obvious that given a set of gestures which are sufficiently distinguishable from each other the recognition may be successful after only a part of the gesture was performed (Fig. 4.9).
The usage of gestures is extended by swipe gestures. This type of gesture brings very natural and comfortable approach. Swipe gestures are designed for fast and routine browsing in menu, programs, and gallery and contains 4 directions for each hand and couple of combination left-right hand. Method called Circle Dynamic Gesture Recognition (CDGR) published in [1] is based on hand detection, speed of movement and distance. If human hand executes a faster motion and inner circle leaves outer circle, system processes this motion and determines a gesture. The gesture is given by angle of executed motion from middle to outer circle. The possible gestures are: swipe left, right, up and down. The same gestures can also be made with both hands. User can also perform a zoom in and zoom out gestures (Fig. 4.10).