Waris Gill

Redmond, USA
waris@vt.edu
LinkedIn
Github
CV

Hi, I’m Waris Gill, a Senior Applied Scientist at Microsoft working on safety and security for generative AI systems (e.g., Azure OpenAI).

I completed my PhD in Computer Science at Virginia Tech, where I focused on interpretability techniques for distributed privacy-preserving AI systems and methods for evaluating Large Language Models in complex software engineering tasks.

During my PhD, I collaborated with industry on LLM infrastructure: semantic caching systems with Cisco (MeanCache) and embedding models for Redis (v1, v2) that are now widely used in production systems.

My work has been published at MLSys, ICSE, FSE, MSR, IPDPS, and SaTML.

news

Feb 07, 2026	My work, `ProToken`, has been accepted at MLSys 2026. ProToken provides token-level attribution for federated LLMs, enabling interpretability in distributed AI systems.
Dec 10, 2025	My work, `BinaryShield`, at Microsoft has been accepted at SaTML 2026. BinaryShield enables privacy-preserving threat intelligence sharing to detect prompt injection attacks across LLM services. Microsoft filed a `patent` for this work.
May 27, 2025	I started my internship at Microsoft as an Applied Scientist in the AI Safety team.
Apr 03, 2025	My research at `Redis` on compact and efficient embeddings for semantic caching has been open sourced (read the paper here). The redis/langcache-embed-v1 and redis/langcache-embed-v2 models have surpassed 60K and 72K downloads on Hugging Face. Both are optimized for semantic caching in LLM services.
Mar 26, 2025	Delivered a talk on `TraceFL`, an interpretability technique based on `neuron provenance` for federated learning, at the Flower AI Summit 2025–the world’s largest federated AI conference—held in London, UK. The talk is available at this link. [Slides]
Jan 30, 2025	Our work on `restoring Jupyter Notebooks` is accepted at MSR-2025. Congratulations, Tien!
Jan 21, 2025	I started my internship at Redis as a Machine Learning Engineer in the Redis AI team.
Dec 19, 2024	My work, in collaboration with Cisco on `MeanCache`, has been accepted at IPDPS-2025. MeanCache is a semantic cache for LLM services.
Nov 01, 2024	The baseline of our paper, FedDebug, for debugging malicious/faulty clients in Federated Learning is available in the Flower AI framework. Check out the code here.
Oct 31, 2024	My work, `TraceFL`, is accepted at 𝗜𝗖𝗦𝗘-𝟮𝟬𝟮𝟱 (acceptance rate ~𝟭𝟬% [132/1219]). TraceFL addresses the open challenge of interpretability in federated learning using neuron provenance.
Oct 03, 2024	Presented 𝐌𝐞𝐚𝐧𝐂𝐚𝐜𝐡𝐞, a semantic cache for LLMs, at the Amazon - Virginia Tech Initiative for Efficient and Robust ML. Selected as one of 18 participants for the poster presentation. [Poster] [Paper]
Aug 19, 2024	Serving as a program committee member on the artifact evaluation track for the 47th International Conference on Software Engineering (ICSE) 2025.
Aug 07, 2024	Delivered an invited talk on Achieving Debugging and Interpretability in Federated Learning Systems at Flower AI, a premier platform for federated learning. [Slides]
Dec 04, 2023	Presented our paper, FedDefender, during the SE4SafeML event at FSE-2023 in San Francisco, California.
Sep 20, 2023	My work at Cisco got open-sourced (Link).
May 22, 2023	I started working at Cisco with Shannon and Pallavi.
May 14, 2023	I received National Science Foundation (NSF) award to present our paper, FedDebug, at ICSE-2023 in Melbourne, Australia. [Slides]

selected publications

ProToken: Token-Level Attribution for Federated Large Language Models

Waris Gill, Ahmad Humayun, Ali Anwar, and 1 more author

In Proceedings of Machine Learning and Systems (MLSys), 2026

Bib PDF

@inproceedings{gill2026protoken,
  author = {Gill, Waris and Humayun, Ahmad and Anwar, Ali and Gulzar, Muhammad Ali},
  booktitle = {Proceedings of Machine Learning and Systems (MLSys)},
  editor = {},
  pages = {},
  publisher = {MLSys},
  title = {{ProToken: Token-Level Attribution for Federated Large Language Models}},
  url = {https://arxiv.org/abs/2601.19672},
  year = {2026}
}

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

Waris Gill, Natalie Isak, and Matthew Dressman

In 2026 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2026

Bib PDF

@inproceedings{gill2026binaryshield,
  author = {Gill, Waris and Isak, Natalie and Dressman, Matthew},
  booktitle = {2026 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)},
  organization = {IEEE},
  title = {{BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints}},
  url = {https://arxiv.org/abs/2509.05608},
  year = {2026}
}

TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

Waris Gill, Ali Anwar, and Muhammad Ali Gulzar

In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), 2025

Bib PDF Code

@inproceedings{gill2025tracefl,
  author = {Gill, Waris and Anwar, Ali and Gulzar, Muhammad Ali},
  booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)},
  keywords = {interpretability;explainability;debugging;federated learning;transformer;machine learning},
  organization = {IEEE},
  title = {{TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance}},
  year = {2025}
}

MeanCache: User-Centric Semantic Caching for LLM Web Services

Waris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, and 2 more authors

In 2025 IEEE 39th International Parallel & Distributed Processing Symposium (IPDPS), 2025

Bib PDF

@inproceedings{gill2025MeanCache,
  author = {Gill, Waris and Elidrisi, Mohamed and Kalapatapu, Pallavi and Anwar, Ali and Gulzar, Muhammad Ali},
  booktitle = {{2025 IEEE 39th International Parallel & Distributed Processing Symposium (IPDPS)}},
  organization = {IEEE},
  title = {{MeanCache: User-Centric Semantic Caching for LLM Web Services}},
  year = {2025}
}

Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data

Waris Gill, Justin Cechmanek, Tyler Hutcherson, and 5 more authors

2025

Bib PDF Code

@misc{gill2025advancingsemanticcachingllms,
  archiveprefix = {arXiv},
  author = {Gill, Waris and Cechmanek, Justin and Hutcherson, Tyler and Rajamohan, Srijith and Agarwal, Jen and Gulzar, Muhammad Ali and Singh, Manvinder and Dion, Benoit},
  eprint = {2504.02268},
  primaryclass = {cs.LG},
  title = {Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data},
  url = {https://arxiv.org/abs/2504.02268},
  year = {2025}
}

Are the Majority of Public Computational Notebooks Pathologically Non-Executable?

Tien Nguyen, Waris Gill, and Muhammad Ali Gulzar

In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), 2025

Bib PDF Code

@inproceedings{nguyen2025majority,
  author = {Nguyen, Tien and Gill, Waris and Gulzar, Muhammad Ali},
  booktitle = {{2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)}},
  title = {Are the Majority of Public Computational Notebooks Pathologically Non-Executable?},
  year = {2025}
}

How Accurately Do Large Language Models Understand Code?

Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, and 5 more authors

2025

Bib PDF

@misc{haroon2025accuratelylargelanguagemodels,
  archiveprefix = {arXiv},
  author = {Haroon, Sabaat and Khan, Ahmad Faraz and Humayun, Ahmad and Gill, Waris and Amjad, Abdul Haddi and Butt, Ali R. and Khan, Mohammad Taha and Gulzar, Muhammad Ali},
  eprint = {2504.04372},
  primaryclass = {cs.SE},
  title = {How Accurately Do Large Language Models Understand Code?},
  url = {https://arxiv.org/abs/2504.04372},
  year = {2025}
}

FedDebug: Systematic Debugging for Federated Learning Applications

Waris Gill, Ali Anwar, and Muhammad Ali Gulzar

In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023

Bib PDF Code

@inproceedings{gill2023FedDebug,
  author = {Gill, Waris and Anwar, Ali and Gulzar, Muhammad Ali},
  booktitle = {2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)},
  issn = {1558-1225},
  keywords = {},
  pages = {512-523},
  title = {{FedDebug: Systematic Debugging for Federated Learning Applications}},
  year = {2023}
}

FedDefender: Backdoor Attack Defense in Federated Learning

Waris Gill, Ali Anwar, and Muhammad Ali Gulzar

In Proceedings of the 1st International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components, , San Francisco, CA, USA, , 2023

Bib PDF Code

@inproceedings{gill2023FedDefender,
  address = {New York, NY, USA},
  author = {Gill, Waris and Anwar, Ali and Gulzar, Muhammad Ali},
  booktitle = {Proceedings of the 1st International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components},
  isbn = {9798400703799},
  keywords = {fault localization, testing, differential testing, poisoning attack, federated learning, backdoor attack, deep learning},
  location = {, San Francisco, CA, USA, },
  numpages = {4},
  pages = {6–9},
  publisher = {Association for Computing Machinery},
  series = {SE4SafeML 2023},
  title = {{FedDefender: Backdoor Attack Defense in Federated Learning}},
  year = {2023}
}