Lip Feature Extraction and Feature Evaluation in the Context of Speech and Speaker Recognition

Petar S. Aleksic, Aggelos K. Katsaggelos

Source Title: Visual Speech Recognition: Lip Segmentation and Mapping

ISBN13: 9781605661865|ISBN10: 1605661864|ISBN13 Softcover: 9781616925338|EISBN13: 9781605661872

DOI: 10.4018/978-1-60566-186-5.ch002

MLA

Aleksic, Petar S., and Aggelos K. Katsaggelos. "Lip Feature Extraction and Feature Evaluation in the Context of Speech and Speaker Recognition." Visual Speech Recognition: Lip Segmentation and Mapping, edited by Alan Wee-Chung Liew and Shilin Wang, IGI Global, 2009, pp. 39-69. https://doi.org/10.4018/978-1-60566-186-5.ch002

APA

Aleksic, P. S. & Katsaggelos, A. K. (2009). Lip Feature Extraction and Feature Evaluation in the Context of Speech and Speaker Recognition. In A. Liew & S. Wang (Eds.), Visual Speech Recognition: Lip Segmentation and Mapping (pp. 39-69). IGI Global. https://doi.org/10.4018/978-1-60566-186-5.ch002

Chicago

Aleksic, Petar S., and Aggelos K. Katsaggelos. "Lip Feature Extraction and Feature Evaluation in the Context of Speech and Speaker Recognition." In Visual Speech Recognition: Lip Segmentation and Mapping, edited by Alan Wee-Chung Liew and Shilin Wang, 39-69. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-60566-186-5.ch002

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

There has been significant work on investigating the relationship between articulatory movements and vocal tract shape and speech acoustics (Fant, 1960; Flanagan, 1965; Narayanan & Alwan, 2000; Schroeter & Sondhi, 1994). It has been shown that there exists a strong correlation between face motion, and vocal tract shape and speech acoustics (Grant & Braida, 1991; Massaro & Stork, 1998; Summerfield, 1979, 1987, 1992; Williams & Katsaggelos, 2002; Yehia, Rubin, & Vatikiotis-Bateson, 1998). In particular, dynamic lip information conveys not only correlated but also complimentary information to the acoustic speech information. Its integration into an automatic speech recognition (ASR) system, resulting in an audio-visual (AV) system, can potentially increase the system’s performance. Although visual speech information is usually used together with acoustic information, there are applications where visual-only (V-only) ASR systems can be employed achieving high recognition rates. Such include small vocabulary ASR (digits, small number of commands, etc.) and ASR in the presence of adverse acoustic conditions. The choice and accurate extraction of visual features strongly affect the performance of AV and V-only ASR systems. The establishment of lip features for speech recognition is a relatively new research topic. Although a number of approaches can be used for extracting and representing visual lip information, unfortunately, limited work exists in the literature in comparing the relative performance of different features. In this chapter, the authors describe various approaches for extracting and representing important visual features, review existing systems, evaluate their relative performance in terms of speech and speaker recognition rates, and discuss future research and development directions in this area.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Lip Feature Extraction and Feature Evaluation in the Context of Speech and Speaker Recognition

MLA

APA

Chicago

Export Reference

Abstract

Request Access