Author(s)
Aki Koivu, PhD
Pin-Yu Lin, Msc
Kristina Simonyan, MD. PhD
DrMed
Matthew R. Naunheim, MD, MBA
Affiliation(s)
Massachusetts Eye and Ear & Harvard Medical School
Abstract:
Objectives: Develop a real-time prediction model to automatically detect laryngeal behaviors from videolaryngoscopy-based pose tracking data during clinical examinations. This will improve data collection quality by providing immediate feedback to clinicians and enable future automation pipelines for efficient large-scale analysis.
Study Design: Proof-of-concept study of a real-time integration of identifying patient behavior states during in-office laryngoscopy using motion tracking.
Methods: We trained a real-time stateful residual Gated Recurrent Unit (GRU) model that analyzed the tracking data of 39 laryngeal keypoints, derived from our previously published keypoint detection model, to predict the patient’s laryngeal task state. These included ‘phonation’, ‘sustained phonation’, ‘swallowing’, ‘idle’, ‘coughing’, ‘sniffing’, and ‘out of view’. 916 segments from 72 laryngoscopy videos were annotated for model training. The resulting model was then evaluated on an independent dataset of 123 segments from 10 videos. Performance was assessed by comparing manual annotations and model predictions using classification metrics and temporal intersection over union (mIoU).
Results: The model achieved a mean accuracy score of 92% and an mIoU of 0.82 when evaluated on the independent test dataset, indicating both agreement with manual annotations and accurate temporal identification of laryngeal task states. At the class level, the model performed consistently across most state categories, with per-class F1-scores on the test set ranging from 83% for phonation to 97% for sniffing. The lowest validation F1 was observed for cough (82% [70%–92%]).
Conclusions: Our study demonstrates how our developed state classification model can identify patients’ laryngeal behaviors, such as swallowing and phonation, using xy coordinates produced by our existing keypoint tracking model. Future work will focus on refining precision and further characterizing the clinical utility of the proposed model.