We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…
Qwen/Qwen3-235B-A22B-Thinking-2507 cover image
featured

Qwen/Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

Public
$0.13/$0.60 in/out Mtoken
fp8
262,144
JSON
Function
Qwen/Qwen3-235B-A22B-Thinking-2507 cover image

Qwen3-235B-A22B-Thinking-2507

Ask me anything

0.00s

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning. It features the following key enhancements:

  • Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise — achieving state-of-the-art results among open-source thinking models.
  • Markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences.
  • Enhanced 256K long-context understanding capabilities.

NOTE: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks.

image/jpeg

Model Overview

Qwen3-235B-A22B-Thinking-2507 has the following features:

  • Type: Causal Language Models
  • Training Stage: Pretraining & Post-training
  • Number of Parameters: 235B in total and 22B activated
  • Number of Paramaters (Non-Embedding): 234B
  • Number of Layers: 94
  • Number of Attention Heads (GQA): 64 for Q and 4 for KV
  • Number of Experts: 128
  • Number of Activated Experts: 8
  • Context Length: 262,144 natively.

NOTE: This model supports only thinking mode.

Additionally, to enforce model thinking, the default chat template automatically includes **\<think>**. Therefore, it is normal for the model's output to contain only **\</think>** without an explicit opening <think> tag.

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Performance

Deepseek-R1-0528OpenAI O4-miniOpenAI O3Gemini-2.5 ProClaude4 Opus ThinkingQwen3-235B-A22B ThinkingQwen3-235B-A22B-Thinking-2507
Knowledge
MMLU-Pro85.081.985.985.6-82.884.4
MMLU-Redux93.492.894.994.494.692.793.8
GPQA81.081.4*83.3*86.479.671.181.1
SuperGPQA61.756.4-62.3-60.764.9
Reasoning
AIME2587.592.7*88.9*88.075.581.592.3
HMMT2579.466.777.582.558.362.583.9
LiveBench 2024112574.775.878.382.478.277.178.4
HLE17.7#18.1*20.321.610.711.8#18.2#
Coding
LiveCodeBench v6 (25.02-25.05)68.771.858.672.548.955.774.1
CFEval2099192920432001-20562134
OJBench33.633.325.438.9-25.632.5
Alignment
IFEval79.192.492.190.889.783.487.8
Arena-Hard v2$72.259.380.872.559.161.579.7
Creative Writing v386.378.887.785.983.884.686.1
WritingBench83.278.485.383.179.180.388.3
Agent
BFCL-v363.867.272.467.261.870.871.9
TAU2-Retail64.971.076.371.3-40.471.9
TAU2-Airline60.059.070.060.0-30.058.0
TAU2-Telecom33.342.060.537.4-21.945.6
Multilingualism
MultiIF63.578.080.377.8-71.980.6
MMLU-ProX80.679.083.384.7-80.081.0
INCLUDE79.480.886.685.1-78.781.0
PolyMATH46.948.749.752.2-54.760.1

* For OpenAI O4-mini and O3, we use a medium reasoning effort, except for scores marked with *, which are generated using high reasoning effort.

# According to the official evaluation criteria of HLE, scores marked with # refer to models that are not multi-modal and were evaluated only on the text-only subset.

$ For reproducibility, we report the win rates evaluated by GPT-4.1.

& For highly challenging tasks (including PolyMATH and all reasoning and coding tasks), we use an output length of 81,920 tokens. For all other tasks, we set the output length to 32,768.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.