Agam Goyal
agamg2 [at] illinois [dot] edu

Affiliated Groups
I’m a second-year Computer Science Ph.D. student at the University of Illinois Urbana–Champaign, co-advised by Prof. Hari Sundaram and Prof. Eshwar Chandrasekharan. I also collaborate closely with Prof. Koustuv Saha. Previously, I was an undergraduate student at the University of Wisconsin-Madison majoring in Computer Science, Mathematics, and Data Science, and advised by Prof. Hanbaek Lyu and Prof. Junjie Hu.
I study sociotechnical systems (especially LLMs) and how they shape social interactions. My work weaves together three threads:
- Understanding models for safety using mechanistic interpretability and machine unlearning to reveal what they encode, diagnose failure modes, and remove or modify undesirable behavior;
- Modeling social interaction in human–human and human–AI settings using NLP and causal inference to explain linguistic phenomena and estimate causal effects;
- Improving outcomes by designing systems that translate these insights into practice and performing human-centric evaluations on them.
My goal with this “explain → model → intervene” loop is to build principled, deployable methods that help us understand and improve real-world social interactions.
My work is supported in part by the Cohere For AI Research Grant Program and the OpenAI Researcher Access Program.
Last summer, I worked as a Research Intern at Adobe Research, mentored by Dr. Apoorv Saxena and Dr. Koyel Mukherjee.
If you’re an undergraduate interested in research experience, feel free to reach out: agamg2@illinois.edu. A strong background in ML/NLP and experience with PyTorch is highly recommended.
News
Aug 20, 2025 | Three papers on LLM-based content moderation, LLM detoxification using sparse autoencoders, and a new, challenging argument summarization dataset have been accepted to EMNLP Main 2025! |
---|---|
Jul 15, 2025 | Our work Uncovering the Internet’s Hidden Values has been accepted to ICWSM 2026! |
Jun 20, 2025 | Our work Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders has been accepted to the Actionable Interpretability Workshop @ ICML! |
Apr 25, 2025 | Gave a talk on Detoxification of LLMs using SAEs at the AImpact Center @ UIUC. [Slides] |
Jan 22, 2025 | Our work on Small Language Models for Content Moderation has been accepted to NAACL 2025 (Main) as an Oral talk! |