Generated Images

Result

gpt-image-2

Infographics & Data

Comparison Chart

Frontier Llm Benchmark Accuracy Comparison Heatmap

1 image1 prompt versions

Prompt Text

Prompt

16:9 landscape orientation high-resolution clean professional data visualization heatmap matrix comparing frontier LLMs against standard benchmarks, plain white uncluttered background, crisp legible sans-serif text throughout.
Top X-axis column labels: rotated 45 degrees clockwise, reading "MMLU", "HumanEval", "GSM8K", "MATH", "BBH", "ARC-C", "HellaSwag", "TruthfulQA".
Left Y-axis row labels: right-aligned, using fully fictional model names reading "Aster-4o", "Orchid Opus", "ImageForge-2", "Atlas 405B", "Qilin-Next", "DeepRiver-V3", "Meridian Large", "YiLan 34B", "Phiro 14B", "OpenMeadow 7B".
Each matrix cell is filled with a dusty teal gradient proportional to accuracy score (pale light dusty teal for low scores, deep saturated dusty teal for high scores), with a small centered high-contrast numeric accuracy value formatted as e.g. "72.3" or "88.1" inside each cell. The highest-scoring cell per individual column is outlined with a 1.5px soft terracotta border.
A vertical color bar legend sits on the far right of the matrix, matching the dusty teal gradient scale, with clearly labeled ticks for "0", "25", "50", "75", "100", and the label "accuracy (%)" printed at the bottom of the bar.
Top header: large bold main title reading "Benchmark comparison across 10 frontier LLMs", with a smaller lighter-weight subtitle directly below reading "zero-shot accuracy; best per benchmark outlined in bold. Evaluated March 2026."
Print-ready sharp quality, no blurry labels, no distortion, no extra decorative elements.