Gray Zone Alignment

DeepSeek's models cannot reliably translate sensitive content. This heatmap shows how this behavior has evolved over the last year. Click individual heatmap cells to view specific examples. Each translation was attempted nine times across three different temperatures and was graded with GPT-4o at temperature 0.65.

Model Temperature:
Task Type:
Search translations:
0%
25%
50%
75%
100%
About This Visualization