AI-assisted computer vision research aims to improve accessibility for deaf, hard of hearing

In This Story

People Mentioned in This Story
Body

Digital assistants like Amazon’s Alexa aren’t currently useful for, say, the hard of hearing and deaf community. George Mason University researchers led by Jana Košecká are making the Internet of Things more inclusive and accessible to those for whom it has not been designed. For the next year, her work to improve "seeing" computer systems to translate continuous American Sign Language into English will be funded by Amazon’s Fairness in AI Research Program. 

portrait of Jana Kosecka
Jana Košecká. Photo by Ron Aira/Creative Services

Having worked at Mason for more than 20 years, Košecká began studying computer vision as it applies to American Sign Language in 2019 with colleagues Huzefa Rangwala and Parth Pathak in collaboration with Gallaudet University. Their work resulted in three academic publications on the topic in 2020. The team’s initial work focused on computer vision recognizing American Sign Language at the word level.  

Košecká describes her current work as a continuation of earlier work, but now, especially with the help of AI, she’s tackling more complex ASL content, such as sentence-level communication, facial expressions, and very specific hand gesticulation.

“The challenge of extending some of these ideas [of computer translation] to American Sign Language translation is the input is video as opposed to text; it's continuous, and you have a lot of challenges, because you have a lot of variations about how people sign,” says Košecká.

The project is accordingly multifaceted. “We are focusing on better hand modeling, focusing on incorporating the facial features and extending to continuous sign language, so you can have short phrases the model can translate to English,” Košecká explains. “We are basically trying to capture continuous sign language and not just individual words." 

To accomplish this goal,Košecká is using weakly supervised learning machine learning methods that provide mechanisms to teach the system without excessive human labelling effort.

Weakly supervised learning techniques don't need perfect alignment of video sequences that contain multiple words,” she says. In the word-level recognition, the model is presented with examples of a video snippet of a signed word and the word text, so it has perfect supervision. Given many examples of the sign apple as a video snippet, the system will learn to recognize the word 'apple.' ”

“There are some techniques which can discover patterns without this need of direct supervision. If you just give the model a lot of examples, the model will figure out repeating patterns of certain words occurring in certain contexts,” she says. “So we are applying these machine-learning techniques to the setting of American Sign Language.”  

Relating her work to AI-powered chatbots like chatGPT, Košecká says, “There has been a lot of headway made in this space for written and spoken languages, and we would like to make a little bit of headway for American Sign Language, using some of these insights and ideas.” 

Košecká envisions her research helping improve the interface between hard of hearing people and their environment, whether that be when they’re communicating with Amazon’s Alexa or ordering at a restaurant counter. No doubt her work will help improve inclusivity and accessibility for the deaf and hard of hearing both at Mason and beyond.