Word-Level Sign Language Recognition from Videos

Hosain, Al Amin

Word-Level Sign Language Recognition from Videos

dc.contributor.advisor	Rangwala, Huzefa
dc.contributor.advisor	Kosecka, Jana
dc.contributor.author	Hosain, Al Amin
dc.creator	Hosain, Al Amin
dc.date.accessioned	2022-08-03T20:18:28Z
dc.date.available	2022-08-03T20:18:28Z
dc.date.issued	2021
dc.description.abstract	Sign language is the primary form of communication among Deaf and Hard of Hearing(DHH) individuals. Due to the absence of speaking capability, voice-controlled assistants such as Apple Siri or Amazon Alexa are not readily available to a DHH individual. An automated sign language recognizer can work as an interface between a DHH individual and the voice-controlled digital devices. Recognizing world-level sign gestures is the first step of an automated sign language recognition system. These gestures are characterized by fast, highly articulate motion of the upper body, including arm movements with complex hand shapes. The primary challenge of a world-level sign language recognizer (WLSLR) is to capture the hand shapes and their motion components. Additional challenges arise due to the resolution of the available video, differences in the gesture speed, and large variations in the gesture performing style across individual subjects. In this dissertation, we study different methods with the goal of improving video-based WLSLR systems. Towards this goal, we introduced a multi-modal American Sign Language (ASL) dataset, GMU-ASL51. This publicly available dataset features multiple modalities and 13; 107 word-level ASL sign videos. We implemented machine learning methods using only video inputand a fusion of videos and body pose data. Usually, word-level sign videos have a varying number of frames, roughly ranging from 10 to 200, based on the source and type of the sign videos. To utilize the frame-wise representation of hand shapes, we implemented Recurrent Neural Network (RNN) models using per-frame hand-shape features extracted from a pre-trained Convolutional Neural Network (CNN). To further improve hand-shape representation, we proposed a hand-shape annotation method. This method can quickly annotate hand-shape images and simultaneously train a CNN model. We later used this model as a hand-shape feature extractor for the downstream sign recognition task. Most of the information in sign language is conveyed using hand-arm movements. To prioritize the hand-arm related features, we proposed a pose guided feature localizing method from 3D feature maps of a 3D CNN model. This method can track the location of hands in a feature map space and extract representative features for hands in a sign video. To further leverage the idea of hand representation, we developed a graph-based hand modeling. This formulation sees the hands as graphs and attempts to model the finger structures using Graph Convolutional Network (GCN). When added with existing models, in an ensemble manner, the graph modeling yielded extra recognition performances. In an attempt to build an interface between DHH individuals and voice assistants, this dissertation presents different building blocks of a video-based WLSLR. These range from developing a multi-modal dataset to improving state-of-the-art video classification models. We demonstrate the roles of hand shapes and pose data in several contexts of sign video modeling. We anticipate that the data and the insights emerged from this work will help to advance the research towards an automated sign language interpreter.
dc.format.extent	133 pages
dc.identifier.uri	https://hdl.handle.net/1920/12925
dc.language.iso	en
dc.rights	Copyright 2021 Al Amin Hosain
dc.subject	Computer science
dc.subject	CNN
dc.subject	Human Pose Data
dc.subject	Neural Networks
dc.subject	RNN
dc.subject	Sign Language
dc.subject	Video Modling
dc.title	Word-Level Sign Language Recognition from Videos
dc.type	Dissertation
thesis.degree.discipline	Computer Science
thesis.degree.grantor	George Mason University
thesis.degree.level	Ph.D.
thesis.degree.name	Ph.D. in Computer Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hosain_gmu_0883E_12700.pdf
Size:: 7.56 MB
Format:: Adobe Portable Document Format

Download

Collections

College of Engineering and Computing