2 Speaker identification
2.5 Environment compensation

To decrease the session variability caused by different training- testing conditions (background noises, different acoustic parameters of recording devices and rooms) there have been invented and employed several concepts that do this. The most basic methods uniformly normalize signal’s dynamic by manipulating the overall power or equalize the power of each frequency band of an averaged speech spectrum which is usually done by cepstral mean subtraction. Further it is possible to use fixed filtering techniques that emphasise a general speech signal like amplifying the speech modulation spectrum or relative spectral analysis (RASTA) filtering. More sophisticated methods try to find optimal transformations mapping enrolment features to features observed in the deployment environment (so called feature mapping methods) or to transform whole models of speakers to match the model of the employment environment (it is called speaker model synthesis). However, these methods are based on higher mathematics, and adapt their behaviour with the incoming data. Thus if the employment environment is changing so does the optimal mapping.

Another less sophisticated but some time useful solution is to have pre-recorded speech samples (features or models) in different conditions and prior to the recognition detect the proper one. Then use the best match environment for a particular recording. It is obvious that the best results are observed when there is a match between training and testing environments.

To get a more detailed overview on the topic of speaker recognition please sees e.g. [2].