Computer science expert using natural language processing to improve equality in language technologies

Computer science researcher Antonis Anastasopoulos uses his love for computer science, language, and linguistics to improve equality in language technologies.

When people ask Siri, Alexa, or Google Assistant a question, they expect the programs to understand them, but that is not always the case, he says.

Antonis standing outside, facing the camera. — Antonis Anastaspopoulos, photo provided.

A person’s language, accent, dialect, and even gender can have an impact, preventing the system from interpreting them correctly, says Anastasopoulos, an assistant professor in the Department of Computer Science and an expert in natural language processing, which is how computers attempt to process and understand human languages.

“The systems don’t work equally well for everyone,” says Anastasopoulos, who speaks Greek (his native language), English, German, Swedish, Italian, and some Spanish.

He is one of several co-principal investigators who received a new National Science Foundation-Amazon grant for their research, “Quantifying and Mitigating Disparities in Language Technologies.”

In the fall, Anastasopoulos also won a Google 2020 Award for Inclusion Research for a project on how accent and dialect impact language technologies.

For the NSF grant, he and experts from Carnegie Mellon University and the University of Washington are studying areas where there is bias in language technologies and measuring the discrepancies. Then they will attempt to mitigate the inequalities.

“We want to measure the extent to which the diversity of language affects the utility that speakers get from language technologies,” Anastasopoulos says. “We will focus on automatic translation and speech recognition since they are perhaps the most commonly used language technologies throughout the world.”

His research will apply to all languages. It’s important to look deeply into languages for differences because languages are flexible and diverse, he says. “There are many regional variations that are different from the standard.”

He also recently received a $350,000 grant from the National Endowment for the Humanities (NEH) to build optical character recognition tools to convert scanned images of text to a machine-readable format for endangered languages.

“We are working on training machine-learning models to process images and texts in the books and documents of indigenous languages from central and South America so that these works can be made accessible to everyone,” he says. “We are building technologies to study those languages computationally.”

Anastasopoulos is also part of a prestigious group of machine-translation researchers, including experts from Facebook, Google, Amazon, and Microsoft, who are creating automatic tools that translate COVID-19-related content for communities where people don’t speak the languages most often used by large health organizations, including the World Health Organization.

“We are working closely with Translators without Borders. So far, we have produced terminologies for more than 200 languages and a large dataset for 35 languages, some of them extremely under-served by the current solution.”

Topics

computer science

computing

natural language processing