Announcement_16
Our work Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders has been accepted to the Actionable Interpretability Workshop @ ICML!
Our work Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders has been accepted to the Actionable Interpretability Workshop @ ICML!