Word-Level Sign Language Recognition from Videos

dc.contributor.advisorRangwala, Huzefa
dc.contributor.advisorKosecka, Jana
dc.contributor.authorHosain, Al Amin
dc.creatorHosain, Al Amin
dc.date.accessioned2022-08-03T20:18:28Z
dc.date.available2022-08-03T20:18:28Z
dc.date.issued2021
dc.description.abstractSign language is the primary form of communication among Deaf and Hard of Hearing(DHH) individuals. Due to the absence of speaking capability, voice-controlled assistants such as Apple Siri or Amazon Alexa are not readily available to a DHH individual. An automated sign language recognizer can work as an interface between a DHH individual and the voice-controlled digital devices. Recognizing world-level sign gestures is the first step of an automated sign language recognition system. These gestures are characterized by fast, highly articulate motion of the upper body, including arm movements with complex hand shapes. The primary challenge of a world-level sign language recognizer (WLSLR) is to capture the hand shapes and their motion components. Additional challenges arise due to the resolution of the available video, differences in the gesture speed, and large variations in the gesture performing style across individual subjects. In this dissertation, we study different methods with the goal of improving video-based WLSLR systems. Towards this goal, we introduced a multi-modal American Sign Language (ASL) dataset, GMU-ASL51. This publicly available dataset features multiple modalities and 13; 107 word-level ASL sign videos. We implemented machine learning methods using only video inputand a fusion of videos and body pose data. Usually, word-level sign videos have a varying number of frames, roughly ranging from 10 to 200, based on the source and type of the sign videos. To utilize the frame-wise representation of hand shapes, we implemented Recurrent Neural Network (RNN) models using per-frame hand-shape features extracted from a pre-trained Convolutional Neural Network (CNN). To further improve hand-shape representation, we proposed a hand-shape annotation method. This method can quickly annotate hand-shape images and simultaneously train a CNN model. We later used this model as a hand-shape feature extractor for the downstream sign recognition task. Most of the information in sign language is conveyed using hand-arm movements. To prioritize the hand-arm related features, we proposed a pose guided feature localizing method from 3D feature maps of a 3D CNN model. This method can track the location of hands in a feature map space and extract representative features for hands in a sign video. To further leverage the idea of hand representation, we developed a graph-based hand modeling. This formulation sees the hands as graphs and attempts to model the finger structures using Graph Convolutional Network (GCN). When added with existing models, in an ensemble manner, the graph modeling yielded extra recognition performances. In an attempt to build an interface between DHH individuals and voice assistants, this dissertation presents different building blocks of a video-based WLSLR. These range from developing a multi-modal dataset to improving state-of-the-art video classification models. We demonstrate the roles of hand shapes and pose data in several contexts of sign video modeling. We anticipate that the data and the insights emerged from this work will help to advance the research towards an automated sign language interpreter.
dc.format.extent133 pages
dc.identifier.urihttps://hdl.handle.net/1920/12925
dc.language.isoen
dc.rightsCopyright 2021 Al Amin Hosain
dc.subjectComputer science
dc.subjectCNN
dc.subjectHuman Pose Data
dc.subjectNeural Networks
dc.subjectRNN
dc.subjectSign Language
dc.subjectVideo Modling
dc.titleWord-Level Sign Language Recognition from Videos
dc.typeDissertation
thesis.degree.disciplineComputer Science
thesis.degree.grantorGeorge Mason University
thesis.degree.levelPh.D.
thesis.degree.namePh.D. in Computer Science

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hosain_gmu_0883E_12700.pdf
Size:
7.56 MB
Format:
Adobe Portable Document Format