A benchmark for evaluating the risk of neural language model degeneration into toxic language when given varying prompts.
Gehman et al.
Expected maximum toxicity