June 18, 2026
Zhipu AI, the Chinese lab also known as Z.ai, has released GLM-5.2, an open-weight large language model with a 1 million token context window and a permissive MIT license. GLM-5.2 is a Mixture-of-Experts model with around 744 billion total parameters and roughly 40 billion active per token, and on independent tests it matches or beats OpenAI’s GPT-5.5 on several long-horizon coding benchmarks at about one sixth of the cost. The model first launched on June 13, 2026 through the GLM Coding Plan, with open weights and a standalone API rolling out across providers in the days that followed.
What is GLM-5.2 and what can it do?
GLM-5.2 is Zhipu AI’s latest flagship foundation model, positioned for long-horizon, agentic work such as multi-step coding. It uses a Mixture-of-Experts architecture with about 744 billion total parameters, of which roughly 40 billion are active for any given token, which keeps inference costs lower than a dense model of similar size. The model supports a 1 million token context window and can return up to 131,072 tokens in a single response, roughly five times the 200,000 token window of its predecessor GLM-5.1.
GLM-5.2 adds two selectable reasoning modes, called High and Max thinking effort. Z.ai recommends the Max setting for complex, multi-step coding work where the model needs to plan and revise across long sequences, while High is aimed at faster everyday use. The release is built for coding agents in particular, with day-one support for eight agentic development environments.
GLM-5.2 benchmarks and technical specifications
Zhipu shipped GLM-5.2 without an official benchmark suite at launch, so the early figures come from independent evaluations. On those tests GLM-5.2 is the strongest open-weight model on standard coding benchmarks, scoring 81.0 on Terminal-Bench 2.1 and 62.1 on SWE-bench Pro, and ranking second on Code Arena Frontend. On Artificial Analysis’s Intelligence Index it scored 51, the highest of any open-weight model. For reference, its predecessor GLM-5.1 reached 77.8 percent on SWE-bench Verified.
The main architectural change is a feature Z.ai calls IndexShare. It reuses the same indexer across every four sparse attention layers, which the company says cuts per-token compute by about 2.9 times at the full 1 million token context length. That is what makes the 1 million token window usable in practice rather than only on paper, since it keeps the cost of very long inputs from growing as steeply as it otherwise would.
How does GLM-5.2 compare to GPT-5.5 and Claude Opus 4.8?
On several long-horizon coding benchmarks GLM-5.2 beats OpenAI’s GPT-5.5 while costing roughly one sixth as much per token. On FrontierSWE it lands within about one percent of Anthropic’s Claude Opus 4.8, the current proprietary leader on many coding tasks. That places GLM-5.2 close to the top closed models on the work that matters most to developers, while remaining fully open weight.
The release also has a competitive backdrop. GLM-5.2 arrived about 48 hours after United States export rules forced Anthropic to disable its Fable 5 and Mythos 5 models for foreign nationals on June 12, 2026. An open-weight model under an MIT license that can be self-hosted anywhere is not subject to the same access restrictions, which is part of why GLM-5.2 has drawn attention from teams outside the United States.
GLM-5.2 availability, pricing, and open weights
GLM-5.2 is released under an MIT license with no regional limits, which permits self-hosting, fine-tuning, and commercial use. The official weights are published on Hugging Face under zai-org/GLM-5.2 and on ModelScope. The model first became available on June 13, 2026 through the GLM Coding Plan, and a standalone API and broader provider support followed over the next several days.
API access is priced at 1.40 dollars per million input tokens and 4.40 dollars per million output tokens, which Z.ai positions at roughly one sixth the cost of comparable frontier models and about ten times cheaper than GPT-5 or Claude on a per-token basis. Combined with the open weights, that pricing is aimed at developers running long, token-heavy coding agents who want to control both cost and where the model runs.
Full technical details and model weights are available in Z.ai’s documentation on GLM-5.2.