Viseme Recognition using lip curvature and Neural Networks to detect Bangla Vowels

Nahid Akhter, Amitabha Chakrabarty


Automatic Speech Recognition plays an important role in human-computer interaction, which can be applied in various vital applications like crime-fighting and helping the hearing-impaired. This paper provides a new method for recognition of Bengali visemes based on a combination of image-based lip segmentation techniques, use of curvature of the both inner and outer lips as well as neural networks. The method is divided into three steps. First step is a lip segmentation step that uses a combination of red exclusion method, HSV space and CIE spaces to produce illumination invariant images. Next, inner and outer lips are extracted separately using a new technique for curve-fitting. Second step is the feature extraction step, which makes use of quadratic curve-coefficients of the inner and outer lip contours. Finally, viseme recognition is done using a Neural Network. A dataset was created with 171 lip images of Bangla Visemes being spoken by different speakers and under different lighting conditions. The proposed method gave a viseme recognition result of 87.3%. Due to the use of non-iterative method as opposed to conventional methods, the algorithm was found to be faster in detecting lip contours.


Active Appearance Model; Artificial Neural Networks; Bangla Viseme Recognition; Lip Reading; Speech Recognition;

Full Text:



L.P. Mei, “Interpretation Of Alphabets By Images Of Lips Movement For Native Language,” Universiti of Teknologi, Malaysia, 2014.

S. Werda, W. Mahdi and Hamadou, A., “Lip Localization and Viseme Classification for Visual Speech Recognition,” International Journal of Computing and Information Sciences, Volume 5, No.1.,April 2007.

Q. C. Chen, G. H. Deng , “An Inner Contour Based Lip Moving Feature Extraction Method for Chinese Speech,” IEEE Xplore, March 2009.

A. Sagheer, N. Tsuruta and R. Taniguchi, “Arabic Lip Reading System: A combination of Hypercolumn Neural Network Model with Hidden Markov Model”, Proceedings of International Conference on Artificial Intelligence and Soft Computing, 2004, pp.311-316.

A. N. Mishra, M. Chandra, “Hindi Phoneme-Viseme Recognition from Continuous Speech,” International Journal of Signal and Imaging Systems Engineering, Volume 6 , No. 3., 2013.

B. Naz and S. Rahim, “Audio-Visual Speech Recognition Development Era; From Snakes to Neural Network: A Survey Based Study,” Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 2, No. 1, 2011.

S. Badura and M. Mokrys, “Lip detection using projection into subspace and template matching in HSV color space,” in International Conference TIC, 2012.

M. Kass, A. Witkin and D. Terzopoulos, “Snakes: Active contour models,” International Journal of Computer Vision, 1987, pp. 321- 331.

U. Saeed and J. L. Dugelay, “Combining Edge Detection and Region Segmentation for Lip Contour Extraction,” in AMDO'10 Proc. 6th International Conf. Articulated Motion and Deformable Objects, 2010, pp. 11-20.

“Viola Jones Object Detection Framework”, Wikipedia. [Online] From:, 2015.

J. M. Zurada, “Introduction to Artificial Neural Systems,” 1992, pp. 1-21 .

Md. Khalilur Rahman, “Neural Network using MATLAB (Powerpoint Presentation),” 2005.

C. Bregler and K. Konig, “Eigenlips for robust speech recognition,” Proc. ICASSP94, Adelaide, Australia, April 19-22, 1994, pp. 669– 672.

S. Gurbuz, K. Patterson, Z. Tufecki and J. N. Gowdy, “Lip-Reading from Parametric Lip Contours for Audio-Visual Speech Recognition,” Eurospeech, 2001.

R. Beale, and J. Finlay, “Neural Networks and Pattern Recognition in Human-Computer Interaction,” Neural Networks and Pattern Recognition in Human-Computer Interaction, 1992, pp. 460.

S. Lucey, S. Sridharan, V. Chandran, “Initialised Eigenlip Estimator for Fast Lip Tracking using Linear Regression,” Proc. 15th International Conference on Pattern Recognition, vol.3, 2000, pp.178- 18.

Y. P. Guan, “Automatic Extraction of Lips based on Multi-scale Wavelet Edge Detection”, in IET Computer Vision, vol.2, no.1, 2008, pp.23-33.

N. Eveno, A. Caplier and P. Coulon,”Accurate and Quasi-automatic Lip Tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, 2004, pp. 706 – 715.

H. Kalbkhani, and M. C. Amirani, “An Efficient Algorithm for Lip Segmentation in Color Face Images Based on Local Information,” J. World. Elect. Eng. Tech 1(1), 2012, pp. 12-16.

S.Badura and M. Mokrys, “Feature Extraction for Automatic Lip Reading System for Isolated Vowels,” The 4th International Virtual Scientific Conf on Informatics and Management Sciences, March 23, 2015.

I. Matthews, T. Cootes, “Extraction of Visual Features for Lip Reading”, IEEE Transactions on Pattern Analysis and Machine Intelligence,Volume 24, February 2002.

B. Hassanat, “Visual Speech Recognition,”, Speech and Language Technologies,Volume 1, June 2011, pp.279-303.

“Image Processing: Morphology-Based Segmentation using MATLAB with program code,” [Online] From:

S. H. Kang, S.H. Song, and S.H. Lee, “Identification of Butterfly Species with a Single Neural Network System,” Journal of AsiaPacific Entomology, 15(3), pp. 431-435.

W. Rehman Butt, and L. Lombardi, “Comparisons of Visual Features Extraction Towards Automatic Lip Reading,” University of Pavia, Italy. Researchgate, 2013.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131