Agam Goyal

agamg2 [at] illinois [dot] edu

Agam_Grad_Pic.jpg

Affiliated Groups

I am a first-year Computer Science Ph.D. student at the University of Illinois Urbana-Champaign, co-advised by Prof. Hari Sundaram and Prof. Eshwar Chandrasekharan. I also collaborate closely with Prof. Koustuv Saha. Previously, I was an undergraduate student at the University of Wisconsin-Madison majoring in Computer Science, Mathematics, and Data Science, and advised by Prof. Hanbaek Lyu and Prof. Junjie Hu.

My research focuses on aligning GenAI technologies with diverse human values and exploring questions around their safe integration in social contexts. I also study the application of NLP techniques more broadly in computational understanding, modeling, and governance of online communities. For more details on my research interests, see this page. My work is graciously supported in part by the Cohere For AI Research Grant Program and the OpenAI Researcher Access Program.

Currently, I am working as a Ph.D. Research Intern at Adobe Research, mentored by Dr. Apoorv Saxena and Dr. Koyel Mukherjee. I am working on developing continual learning techniques for LLMs with applications to agentic RAG settings.

If you are an undergraduate student interested in gaining research experience, please reach out to me at agamg2@illinois.edu. A strong background in ML/NLP and experience with PyTorch is highly recommended.

News

Jun 20, 2025 Our work Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders has been accepted to the Actionable Interpretability Workshop @ ICML!
Apr 25, 2025 Gave a talk on Detoxification of LLMs using SAEs at the AImpact Center @ UIUC. [Slides]
Jan 22, 2025 Our work on Small Language Models for Content Moderation has been accepted to NAACL 2025 (Main) as an Oral talk!
Oct 17, 2024 Two new preprints on Uncovering the Internet’s Hidden Values and Small Language Models for Content Moderation are now on arXiv!
Jun 20, 2024 Check out our new pre-print on aligning LLM-agents using belief networks Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks on arXiv.

Selected publications

  1. NAACL, ICLR’W
    Simulating Opinion Dynamics with Networks of LLM-based Agents
    Yun-Shiuan Chuang,  Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, and Timothy T Rogers
    In Findings of the Association for Computational Linguistics: NAACL 2024, 2024
  2. NAACL Oral
    SLM-Mod: Small Language Models Surpass LLMs at Content Moderation
    Xianyang Zhan*Agam Goyal*, Yilun Chen, Eshwar Chandrasekharan, and Koustuv Saha
    In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025
  3. arXiv
    Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
    Agam Goyal, Vedant Rathi, William Yeh Yeh, Yian Wang, Yuen Chen, and Hari Sundaram
    arXiv preprint arXiv:2505.14536, May 2025
  4. arXiv
    MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance
    Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, and Eshwar Chandrasekharan
    arXiv preprint arXiv:2505.14483, May 2025