Classification of visual speech features

Recognition not affected by temporal transitions between the frames of the videosequence; one frame is enough for letter recognition

? use static classifiers; no need to model temporal evolution of data:

Neural network classifier:
used for vowel recognition in [Shinchi];
structure of the NN: 8 units-input layer;
20 units-hidden layer; 5 units-output layer.
Probabilistic classifier: used for A-Z letters recognition in [Matthews]; decision rule:

Aristotle University of Thessaloniki

Isolated letter recognition:

OA=audio obs.; OV=video obs.

Previous slide Next slide Back to first slide View graphic version