Real time head and
facial features tracking

Our approach to facial features reconstruction is a two stage process, which separates the head tracking from the reconstruction of the facial features. First, a 3D textured polygonal model is fitted to an image sequence acquired using a single and non-calibrated camera. The model texture is initially reconstructed directly from the input image as the 2D texture which matches the rendered model and the video image. The fitting process exploits both motion and texture information. Motion information is obtained evaluating the optical flow between two consecutive frames, while texture information is gathered warping the image frame in the texture space of the model. After pose reconstruction, each input image is projected on the texture map of the model. The image obtained provides a stabilized view of the face, that is an image where the face has always the same position, orientation and size, which is processed to reconstruct significant face features, like eyes, brows and lips. Feature detection and tracking uses a set of multi-state deformable templates which are fitted to the warped image exploiting motion, shape and color information. The shape parameters of the templates are then output as face animation parameters. Combining the result of both phases, it is easy to obtain a standard animation parameters stream, i.e. an MPEG4 stream.

The outlined system has been designed in order to meet the following requirements:

  1. it must be non-intrusive, that is no external devices must be worn by the subject
  2. run-time reconstruction rates must be achieved (that is 25/30 frame/s)
  3. images should be acquired with common PC cameras and interface cards; this reduces the dimension of the available images to 320*240 pixels arrays

View full image