4 Gesture navigation
4.2 Image data acquisition

In early research of gesture recognition algorithms design was based on RGB (Red Green Blue) camera input. This approach brings some advantages, which, however, are outweighed by disadvantages. RGB image consists of color channels and exist many factors which negatively influence quality of RGB image.

Illumination change can radically influence quality of image. A change in illumination conditions in a room absolutely changes the brightness and contrast of image. This makes recognition using RGB image uncertain [2].

In last few years the researchers started to use depth images for processing. Depth image is a two-dimensional image that contains additional information – depth. Most depth sensors are based on infrared (IR) emission (e.g. the Kinect sensor in Fig. 4.2). The sensor returns data which represent distance of each pixel in the frame from the sensor. The obtained distances can be simply transformed into grayscale representation and so depth data are represented as grayscale video sequence. Obtaining data does not require any special conditions.

The depth data have more advantages against RGB cameras. The main advantage is, that the depth camera makes use of infra-red light and in infra-red zone isn’t so much noised. So infrared sensor isn’t so sensitive than RGB sensor [2].

image
Fig. 4.2 – Image from depth camera

The distance of each pixel from the sensor is defined in millimeters.

If we want to convert depth image array into grayscale image, we need to know minimal and maximal possible distance.

(003)

where, d is actual distance of the given pixel, dmin is minimal possible distance from the sensor and dmax is maximal possible distance from the sensor [2].

To help with image data acquisition process some extra elements, like luminous bracelet, ring or special small ball in hand.