Multimedia processing
Face animation

The human face can be modeled and animated by many methods. For choosing the best method we need to define requirements for our own module of animation. The list of requirements for module of animation is:

There are two basic approaches for face animation: two-dimensional and three-dimensional and real time animation and forward calculated animation.

The real time animation allows users an interactive intervention to animation and reduces the time necessary for the preparation of the animation. The disadvantage of this method is its quality of an image. The calculation of each image should not be longer than approximately 0.05s, because there should be at least 20 images per second (20 FPS – frames per second). Quality of three-dimensional modeling is better than the quality of two-dimensional approach. This animation models the reality more naturally.

For the face animation are the following methods known: interpolation, parameterization, simulation of muscles, etc.

There are many various techniques for modeling human face in space, for example polygonal modelling, modelling by parametric areas, sub-division modeling.

Talking head on mobile phone

This chapter introduce multimedia speech synthesis project. It describes an application of the speech synthesizer together with the face animation for a mobile cell phone. The ultimate goal of this project was to develop a multimedia communication system without the need to transfer any video and audio data. The results concern a mobile phone Java application for reading of short messages (SMS). After receiving SMS (short message service), a talking head based on a sender’s photo appears on the screen and animate the reading while the speech will be synthesized in parallel.

In our example both the model and the visemes needed for face animation are in the object (OBJ) format, which means the files include not only the model, but also the texture which belongs to it.

Neutral face model and face texture

The viseme, we are referring to is a deformed model of the face. This is not just any kind of deformation; it is the deformation as if the face was saying the given phoneme. The model of the visemes still has the same number of nodes, which have the same numbering scheme and are connected with the same lines as the neutral model. The only change is the position of the nodes. Because of this simplification we are able to perform an easy interpolation of the nodes and the orientation of the subsurfaces defined by the nodes.

The animation itself is realized by the before mentioned viseme interpolation. The neutral model is read from the file (together with the other models and visemes). The interpolation is performed between these models. The animation is also based on the time (this is a real time type of animation). The model is also deformed based on results from face points detection.

Face visemes

Speech synthesis use diphone speech synthesizer described in chapter Speech synthesis. The synthesized text has to be synchronized with the speaking face.

Application on mobile phone: personalized face (first one) and general model (second nd third one)