ECE PhD student reveals the security risks of explainable AI 

In This Story

People Mentioned in This Story

In a world increasingly reliant on artificial intelligence (AI), the quest to understand these complex systems has led to the emergence of explainable AI (XAI). However, ECE doctoral student Sreenitha Kasarapu shed light on the unforeseen consequences of this pursuit through her project, “Leveraging Explainable AI for Designing Evolvable Malware.”

Kasarapu, advised by Sai Manoj Pudukotai Dinakarrao, presented her project along with a second at the Commonwealth Cyber Initiative’s 2024 Symposium.

"So we use explainable AI methods... but as we are getting estimations and analysis of what's happening inside the neural network model, we also have the opportunity for attackers to exploit that information and craft malware using these explainable AI methods," she said. 

Kasarapu and the two posters she presented at the Commonwealth Cyber Initiative's 2024 Symposium
Kasarapu and the two posters she presented at the Commonwealth Cyber Initiative's 2024 Symposium 

By leveraging traditional XAI methods such as lime, SHAP, Integrated Gradient, or gradient methods, attackers can identify key features within AI models and manipulate them to subvert their intended functionality. Kasarapu demonstrated this by using XAI techniques to identify crucial features within an AI model and perturb them strategically. The result was benign malware. 

"The malware is classified as benign; it can steal user information for a long period of time until detected as malware," she explained. 

Kasarapu’s aim was not only to exploit XAI, but also to demonstrate how it can be weaponized against itself, creating a new breed of malware capable of bypassing conventional security measures. Through her work, Kasarapu revealed how XAI can unwittingly empower malicious actors to develop evolvable malware. 

Because her malware influenced the AI model’s ability to recognize any forms of malware, Kasarapu demonstrated how XAI methods could progressively weaken an AI model. After her crafted malware attack, the AI model achieved an accuracy of less than 50% when classifying malware, a significant drop from its baseline accuracy of 90%.  

With her project, Kasarapu illuminated a sobering reality: the pursuit of transparency within AI systems, while noble, may inadvertently open Pandora's box, empowering adversaries with the means to develop progressively damaging malware.