2019 Combined Otolaryngology Spring Meetings

Poster E140

Status: File under review
Section: TRIO
Track: Otology/Neurotology
Presenter: Arun M Raghavan BS

Implementation of Neural Network Based Visual Speech Recognition Program on Bamford-Kowal-Bench Speech in Noise (BKB-SIN) Test

Author(s)
Arun M. Raghavan BS
Gavriel D. Kohlberg MD
Noga Lipschitz MD
Joseph T. Breen MD
Ravi N. Samy MD FACS

Affiliation(s)
University of Cincinnati College of Medicine

Abstract:
Educational Objective: At the conclusion of this presentation, participants should be aware of the potential benefits and structure of a visual speech recognition program for augmenting human speech perception. Objectives: Evaluate the accuracy and speed achieved by a visual speech recognition program (VSRP) based upon a long short term memory (LSTM) neural network. Study Design: Prospective study. Methods: A dual video/infrared camera was used to continuously track 35 points around the lips during speech in real time. A real time geometric transformation was implemented to normalize all tracked points to a common three dimensional axis. A VSRP consisting of three separate LSTM neural networks with Softmax classification layers was developed to identify 42 sentences from the Bamford-Kowal-Bench Speech in Noise (BKB-SIN) test using these data. Each neural network was put through a 10-fold cross validation on 2800 samples representing 14 sentence subsets of the 42 BKB-SIN sentences. The network input consisted of a sequence of data frames each consisting of 105 features, and each network had 800 hidden units. Classification time, defined as the time elapsed between the network receiving an input dataset and providing a classification result, was evaluated across all 2800 samples. Results: The VSRP achieved an average accuracy (across the three networks) on 10-fold cross validation of 75.90 ± 8.42 (± SD). The average classification time was 7.3 ± 2.3ms (± SE). Conclusions: The VSRP achieved a high level of accuracy across sentences taken from a common speech battery. Further evaluation is needed to demonstrate the use of this system in augmenting human speech perception. It may assist those with hearing loss, such as hearing aid or cochlear implant users.

E140 - Implementation of Neural Network Based Visual Speech Recognition Program on Bamford-Kowal-Bench Speech in Noise (BKB-SIN) Test
E141 - Association between Cardiometabolic Risk Factors and Dizziness in African-Americans
E142 - Practices and Perceptions of Cognitive Assessment for Adults with Age Related Hearing Loss
E143 - Thirty Day Readmission and Prolonged Length of Stay in Malignant Otitis Externa
E144 - Use of Google Trends to Evaluate for Geographical or Seasonal Variation in Search Terms for Meniere's Disease
E145 - Utility of Inpatient Imaging during the Workup for Vertigo: A Nationwide Analysis
E146 - Oticara Otic: Topical, Single Dose Application Utilized in 178 Cases of Fungal Otitis Externa
E147 - Bilateral Uveitis and Sensorineural Hearing Loss in a Man with Stage IV Malignant Melanoma after Nivolumab Immunotherapy
E148 - Resting Ambient Pressure Tympanometry Provides Key Information Suggesting Alternative Diagnoses in Patients Tested for Patulous Eustachian Tube
E149 - No Shortage of Decibels in Music City: Evaluation of Noise Exposure in Urban Music Venues
E150 - Auditory and Head and Neck Imaging Findings in Neurofibromatosis Type 1
E151 - A Comparison of Outcomes of Two Intratympanic Steroid Protocols for Meniere Disease
E152 - Practice Patterns in Pediatric Endoscopic Sinus Surgery for Chronic Rhinosinusitis
E153 - Transpalatal Approach to Repair of Bilateral Choanal Atresia in a Child with Craniofacial and Skull Base Anomalies: Revisiting an Historic Technique

POSTERS 589-602 OF 878

True