Back to Benchmarks

RealToxicityPrompts

Toxicity 2020 100K prompts

Description

A benchmark for evaluating the risk of neural language model degeneration into toxic language when given varying prompts.

Authors

Gehman et al.

Metrics

Expected maximum toxicity